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COMMISSIONER FOR PATENTS 

P.O. Box 1450 

Alexandria, VA 22313-1450 

Dear Sir: 

The Applicants appeal the rejection of Claims 1-5 in the above-captioned patent 
application. These claims were rejected in a Final Office Action dated March 30, 2005. In 
response to Applicants' Amendment and Response to Final Office Action, the Examiner issued 
an Advisory Action dated June 17, 2005. Applicants mailed a Notice of Appeal August 1, 2005. 

I. REAL PARTY IN INTEREST 

Pursuant to 37 C.F.R. 41.37(c)(1), Appellants hereby notify the Board of Patent Appeals 
and Interferences that the real party in interest is the assignee of record for this application, 
Genentech, Inc., 1 DNA Way, South San Francisco, CA 94080. 

II. RELATED APPEALS AND INTERFERENCES 

A Notice of Appeal has been filed in the related Application Nos. 10/063,653, 
10/063,659, 10/063,660, 10/063,661, 10/063,713, 10/063,534, 10/063,540, 10/063,560, 
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10/063,578, 10/063,584, 10/063,586, 10/063,587, 10/063,616, 10/063,648, 10/063,652. A 
Notice of Appeal and an Appeal Brief have also been filed in the related Application Nos. 
10/063,578 and 10/063,584. Appellants are unaware of any other related appeals or interferences. 

III. STATUS OF THE CLAIMS 

The above-captioned application was filed with Claims 1-6. Claim 6 was canceled in an 
Amendment and Response to Office Action mailed September 27, 2004. Claims 1-5 were finally 
rejected by the Examiner in a final Office Action mailed March 30, 2005. Appellants mailed an 
Amendment and Response to Final Office Action on May 27, 2005, amending Claim 1. The 
Examiner issued an Advisory Action dated June 17, 2005, which did not indicate whether the 
claim amendment and other evidence submitted in the After Final Amendment would be entered 
for purposes of appeal. Appellants confirmed in telephone conferences with the Examiner on 
June 23, 2005, and his Supervisor on August 2, 2005 that the amendments and exhibits were 
entered by the Examiner. Accordingly, Claims 1-5 are the subject of this appeal. The claims at 
issue are attached hereto as Appendix A. 

IV. STATUS OF AMENDMENTS 

Appellants mailed an Amendment and Response to Final Office Action on May 27, 2005, 
amending Claim 1. The Examiner issued an Advisory Action dated June 17, 2005, which did not 
indicate whether the claim amendment and other evidence submitted in the After Final 
Amendment would be entered for purposes of appeal. Appellants confirmed in telephone 
conferences with the Examiner on June 23, 2005, and his Supervisor on August 2, 2005, that the 
amendments and exhibits were entered by the Examiner. 

V. SUMMARY OF THE CLAIMED SUBJECT MATTER 

The claimed subject matter relates to isolated antibodies which specifically bind to the 
polypeptide having SEQ ID NO: 34. As amended, independent Claim 1 reads: 

1. An isolated antibody that specifically binds to the polypeptide of SEQ ID NO: 34. 

Various aspects of the claimed antibody are described in the specification at, for example, 
paragraphs [0024], [0225],[0238]-[0248], and [0361]-[0405]. SEQ ID NO: 34 is disclosed in the 
Sequence Listing appended to the application. 
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VI. GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 

The Examiner has rejected Claims 1-5 under 35 U.S.C. §101, stating that the claimed 
invention is not supported by either a specific and substantial asserted utility or a well- 
established utility. 

The Examiner has also rejected Claims 1-5 under 35 U.S.C. §112, first paragraph. The 
Examiner asserts that since the claimed invention is not supported by either a specific or 
substantial asserted utility or a well-established utility, one skilled in the art clearly would not 
know how to use the claimed invention. 

Claims 1-5 can be considered as a group for purposes of the utility and enablement 
rejections. 

VII. APPELLANTS' ARGUMENT 

A. Summary of Arguments 

1. Utility Rejection 

The first issue before the Board is whether Appellants have asserted at least one "specific, 
substantial, and credible utility." See Examination Guidelines ("Utility Guidelines"), 66 Fed. 
Reg. 1092 (2001). Appellants have asserted that the claimed antibodies to the polypeptide of 
SEQ ID NO: 34 (the PR01277 polypeptide) are useful as diagnostic tools for cancer, particularly 
for esophageal and skin cancer. This asserted utility is specific, substantial, and credible. 

Briefly stated, Appellants' asserted utility is based on the disclosure in Example 18 of the 

instant application that the mRNA encoding the PRO 1277 polypeptide is more highly expressed 

in normal esophageal and skin tissue compared to esophageal and melanoma tumor, respectively. 

It is well-established that there is a reasonable correlation between changes in mRNA level for a 

particular gene and a corresponding change in the level of expression of the encoded polypeptide, 

such that increasing or decreasing the amount of mRNA for a particular gene leads to a 

corresponding increase or decrease in the amount of the encoded protein. Thus, one of skill in 

the art would be more likely than not to believe that, like the PRO 1277 mRNA, the PRO 1277 

protein is more highly expressed in normal esophageal and skin tissue compared to esophageal 

and melanoma tumor, respectively. This differential expression of PR01277 polypeptide is 
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useful for distinguishing esophageal or melanoma tumor tissue from its normal tissue counterpart. 
Therefore, the claimed antibodies to the PRO 1277 polypeptide have a specific, substantial and 
credible utility as diagnostic tools for cancer, particularly esophageal and skin cancer, as is 
explained in more detail below. 

2. Enablement Rejection 

The second issue before the Board is whether Appellants have enabled the pending 
claims such that one of skill in the art would be able to make and use the claimed invention. The 
Examiner has rejected Claims 1-5 under 35 U.S.C. §112, first paragraph, arguing that because 
the claimed invention is not supported by either a specific or substantial asserted utility or a well- 
established utility, one skilled in the art clearly would not know how to use the claimed invention. 
See First Office Action at 8; Final Office Action at 3. 

For the reasons discussed in detail below, the claimed invention is supported by a specific, 
substantial and credible utility. Because the lack of a supporting utility is the only basis for the 
Examiner's rejection under 35 U.S.C. § 112, first paragraph, the Board should reverse the 
rejection of Claims 1-5 as lacking enablement. 

B. Utility Rejection - Detailed Arguments 

The first issue before the Board is whether Appellants have asserted at least one "specific, 
substantial, and credible utility." See Examination Guidelines, 66 Fed. Reg. 1092 (2001). 
Appellants have asserted that the claimed antibodies to the polypeptide of SEQ ID NO: 34 (the 
PRO 1277 polypeptide) are useful as diagnostic tools for cancer, particularly for esophageal and 
skin cancer. This asserted utility is specific, substantial, and credible, as is explained in more 
detail below. 

1. Utility - Leual Standard 

A "specific utility" is defined as utility which is "specific to the subject matter claimed," 
in contrast to "a general utility that would be applicable to the broad class of the invention." See 
M.P.EP. § 2107.01 I. For example, it is generally not enough to state that a nucleic acid is 
useful as a diagnostic tool without also identifying the condition that is to be diagnosed. 
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The requirement of "substantial utility" defines a "real world" use, and derives from the 
Supreme Court's holding in Brenner v. Manson, 383 U.S. 519, 534 (1966) stating that "[t]he 
basic quid pro quo contemplated by the Constitution and the Congress for granting a patent 
monopoly is the benefit derived by the public from an invention with substantial utility." In 
explaining the "substantial utility" standard, M.P.E.P. § 2107.01 cautions, however, that Office 
personnel must be careful not to interpret the phrase "immediate benefit to the public" or similar 
formulations used in certain court decisions to mean that products or services based on the 
claimed invention must be "currently available" to the public in order to satisfy the utility 
requirement. "Rather, any reasonable use that an applicant has identified for the invention that 
can be viewed as providing a public benefit should be accepted as sufficient, at least with regard 
to defining a 'substantial' utility." M.P.E.P. § 2107.01 (emphasis added). 

Indeed, the Guidelines for Examination of Applications for Compliance With the Utility 
Requirement, set forth in M.P.E.P. § 2107 11(B)(1) gives the following instruction to patent 
examiners: "If the applicant has asserted that the claimed invention is useful for any particular 
practical purpose ... and the assertion would be considered credible by a person of ordinary skill 
in the art, do not impose a rejection based on lack of utility." 

Finally, in assessing the credibility of the asserted utility, the M.P.E.P. states that "to 
overcome the presumption of truth that an assertion of utility by the applicant enjoys" the PTO 
must establish that it is "more likely than not that one of ordinary skill in the art would doubt (i.e., 
'question') the truth of the statement of utility." M.P.E.P. § 2107.02 III A. 

2. Utility - Burden of Proof 

It is well established that a specification which contains a disclosure of utility which 
corresponds in scope to the subject matter sought to be patented "must be taken as sufficient to 
satisfy the utility requirement of § 101 for the entire claimed subject matter unless there is reason 
for one skilled in the art to question the objective truth of the statement of utility or its scope." In 
re Longer, 503 F.2d 1380, 1391, 183 U.S.P.Q. 288, 297 (C.C.P.A. 1974). Thus "the PTO has the 
initial burden of challenging a presumptively correct assertion of utility in the disclosure." In re 
Brana, 51 F.3d 1560, 1566, 34 U.S.P.Q.2d 1436 (Fed. Cir. 1995). Only after the PTO provides 
evidence showing that one of ordinary skill in the art would reasonably doubt the asserted utility 
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does the burden shift to the applicant to provide rebuttal evidence sufficient to convince such a 
person of the invention's asserted utility. Id. 

3. Utility - Standard of Proof 

Compliance with 35 U.S.C. § 101 is a question of fact. Raytheon v. Roper, 724 F.2d 951, 

956, 220 U.S.P.Q. 592, 596 (Fed. Cir. 1983). The evidentiary standard to be used throughout ex 

parte examination in setting forth a rejection is a preponderance of the evidence, or "more likely 

than not" standard. In re Oetiker, 977 F.2d 1443, 1445, 24 U.S.P.Q.2d 1443, 1444 (Fed. Cir. 

1992). This is stated explicitly in the M.P.E.P.: 

[T]he applicant does not have to provide evidence sufficient to establish that an 
asserted utility is true "beyond a reasonable doubt." Nor must the applicant 
provide evidence such that it establishes an asserted utility as a matter of 
statistical certainty. Instead, evidence will be sufficient if, considered as a whole, 
it leads a person of ordinary skill in the art to conclude that the asserted utility is 
more likely than not true . M.P.E.P. § 2107.02, part VII (emphasis in original, 
citations omitted). 

The Court of Appeals for the Federal Circuit has stated that the standard for satisfying the 
utility requirement is a low one: 

The threshold of utility is not high : An invention is "useful" under section 101 if 
it is capable of providing some identifiable benefit. See Brenner v. Manson, 383 
U.S. 519, 534, 86 S.Ct. 1033, 16 L.Ed.2d 69 (1966); Brooktree Corp. v. Advanced 
Micro Devices, Inc., 977 F.2d 1555, 1571 (Fed. Cir. 1992) ("To violate § 101 the 
claimed device must be totally incapable of achieving a useful result"); Fuller v. 
Berger, 120 F. 274, 275 (7th Cir. 1903) (test for utility is whether invention "is 
incapable of serving any beneficial end"). Juicy Whip, Inc. v. Orange Bang, Inc., 
185 F.3d 1364, 1366, 51 U.S.P.Q. 2d 1700 (Fed. Cir. 1999) (emphasis added). 

The low threshold for satisfying the utility requirement is reflected in the standard set by the 
Federal Circuit for invalidating a patent based on a lack of utility: "[T]he fact that an invention 
has only limited utility and is only operable in certain applications is not grounds for finding lack 
of utility. Some degree of utility is sufficient for patentability. Further, the defense of non- 
utility cannot be sustained without proof of total incapacity ." Envirotech Corp. v. Al George, 
Inc., 730 F.2d 753, 762, 221 U.S.P.Q. 473 (Fed. Cir. 1984) (emphasis added, citations omitted). 

Because the standard for satisfying the utility requirement is so low, requiring total 
incapacity for a finding of no utility, the M.P.E.P. cautions that: 
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Rejections under 35 U.S.C. 101 have been rarely sustained by federal courts. 
Generally speaking, in these rare cases, the 35 U.S.C. 101 rejection was sustained 
[] because the applicant . . . asserted a utility that could only be true if it violated a 
scientific principle, such as the second law of thermodynamics, or a law of nature, 
or was wholly inconsistent with contemporary knowledge in the art. M.P.E,P. § 
2107.02 III B., citing In re Gazave, 379 F.2d 973, 978, 154 U.S.P.Q. 92, 96 
(C.C.P.A. 1967) (underline emphasis in original, italic emphasis added). 

4. Appellants Asserted a Specific. Substantial and Credible Utility that is 
Sufficient to Satisfy the Utility Requirement ofS 101 

The claimed subject matter relates to antibodies which specifically bind to the 
polypeptide having SEQ ID NO: 34. The polypeptide of SEQ ID NO: 34 (referred to as 
"PRO 1277 polypeptide") is encoded by the polynucleotide of SEQ ID NO: 33 (also referred to 
as DNA56869-1545). Specification at [0059-0060]. Appellants have asserted that the 
claimed antibodies are useful as diagnostic tools for cancer, particularly esophageal and skin 
cancer. 

In "Example 18: Tumor Versus Normal Differential Tissue Expression Distribution" 
Appellants disclose that the mRNA encoding PRO 1277 polypeptide is more highly expressed in 
normal esophageal and skin tissue compared to esophageal and melanoma tumor, respectively. 
Specification at ffif [0529-0530] and accompanying tables. As explained in paragraph [0530], the 
differential expression of the PRO 1277 mRNA was detected using the well-established 
technique of quantitative PCR amplification of cDNA libraries isolated from different human 
normal and tumor tissue samples. To ensure that equivalent amounts of nucleic acid were used 
in each reaction, the cDNA for p-actin was used as a control. 

The specification teaches that identification of the differential expression of a PRO 

polypeptide-encoding mRNA in one or more tumor tissues as compared to one or more normal 

tissues of the same tissue type "renders the molecule useful diagnostically for the determination 

of the presence or absence of tumor in a subject suspected of possessing a tumor." Specification 

at ^ [0530]. Because it is well established that changes in mRNA levels lead to changes in the 

level of the encoded protein, one would expect the PRO 1277 protein to be differentially 

expressed in esophageal and skin tumors. The specification states that PRO polypeptides "may 

also be used diagnostically for tissue typing, wherein the PRO polypeptides of the present 

invention may be differentially expressed in one tissue as compared to another, preferably in a 

diseased tissue as compared to a normal tissue of the same tissue type." Specification at If [0336]. 

-7- 



Appl. No. 
Filed 



10/063,540 
May 2, 2002 



Likewise, the specification disclose the use of antibodies to PRO polypeptides as diagnostic 
tools: 

[A]nti-PRO antibodies may be used in diagnostic assays for PRO [polypeptide], 
e.g., detecting its expression (and in some cases, differential expression) in 
specific cells, tissues, or serum. Various diagnostic assay techniques known in 
the art may be used, such as competitive binding assays, direct or indirect 
sandwich assays and immunoprecipitation assays conducted in either 
heterogeneous or homogeneous phases. Specification at [0407]. 

Taken together, the specification clearly discloses the use of the claimed antibodies as 
diagnostic tools for cancer, particularly esophageal and skin cancer. This utility is substantial, as 
one of skill in the art will recognize that the diagnosis of cancer is a "real world" use; it is 
specific, as the diagnosis of esophageal and skin cancer is not a utility that applies to the broad 
class of antibodies; and it is credible, as it not a utility "that could only be true if it violated a 
scientific principle, ...or a law of nature, or [is] wholly inconsistent with contemporary 
knowledge in the art." M.P.E.P. § 2107.02 III B., citing In re Gazave, 379 F.2d 973, 978, 154 
U.S.P.Q. 92, 96 (C.C.P.A. 1967). Because Appellants' specification contains a disclosure of 
utility which corresponds in scope to the claimed subject matter, the asserted utility "must be 
taken as sufficient to satisfy the utility requirement of § 101 for the entire claimed subject matter 
unless there is reason for one skilled in the art to question the objective truth of the statement of 
utility or its scope." In re Longer, 503 F.2d 1380, 1391, 183 U.S.P.Q. 288, 297 (C.C.P.A. 1974). 
Therefore, the burden of establishing a prima facie case of lack of utility rests with the PTO. See, 
In re Brana, 51 F.3d 1560, 1566, 34 U.S.P.Q.2d 1436 (Fed. Cir. 1995) ("the PTO has the initial 
burden of challenging a presumptively correct assertion of utility in the disclosure"). 

5. The Data in Example 18 are Data Regarding Differential mRNA Levels, not 
Gene Amplification 

Appellants begin by clarifying that the data concerning the differential expression of the 

PRO 1277 gene presented in Example 18 relate to gene expression, not gene amplification . The 

description of Example 18 makes clear that the results were obtained by quantitative PCR 

amplification of cDNA libraries. It is well known in the art that cDNA libraries are made from 

mRNA, and reflect the level of mRNA for a particular gene in the source tissue. Thus, Example 

18 is reporting a measure of the expression of the PRO 1277 gene, i.e. mRNA levels, not its 

amplification, i.e. the number of copies of the PRO 1277 gene in the genome. 
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For this reason, the relationship between gene amplification or aneuploidy (an abnormal 
number of chromosomes) and the level of gene expression, i.e. mRNA levels, is not relevant to 
the instant application. The distinction between gene amplification and gene expression has been 
clarified for the Examiner on numerous occasions, as has the irrelevance of any relationship 
between aneuploidy and gene expression. 

Despite this fact, throughout the final Office Action, the Examiner refers to Example 18 
as "amplification data." For example, the Examiner states "[g]iven the increase in amplified 
DNA (DNA copy number) for PRO 1277...." Office Action at 5 (emphasis added). 1 Similarly, 
the Examiner states that "the instant specification provides no information regarding differential 
mRNA levels of PRO 1277 .... The specification describes only gene amplification data ." Id at 
12 (emphasis added). In rejecting the declaration of Dr. Polakis, the Examiner again states that 
"the instant specification provides no information regarding differential mRNA levels of 
PR01277.... Only gene amplification data were presented ." Id 13-14 (emphasis added). 

The Examiner also continues to cite Pennica et al (Proc. Natl. Acad. Sci. USA (1998) 
95:14717-14722) to teach that "what is often seen is a lack of correlation between DNA 
amplification and increased gene expression " and that "it does not necessarily follow that an 
increase in gene copy number results in increased gene expression ." Id. at 4 and 16 (emphasis 
added). And although the Examiner agrees that Pennica does not discuss the relationship 
between the mRNA level and the level of protein expression, the only relationship at issue in the 
instant application, the Examiner continues to cite Pennica "to show the lack of correlation of 
between DNA amplification and gene expression ." Id. at 16 (emphasis in original). Elsewhere 
the Examiner states that "one cannot determine from the data in the specification whether the 
observed ' amplification' of nucleic acid is due to increase in copy number, or alternatively due to 
increase in transcription rates." Id. at 12 (emphasis added). 

The Examiner illustrates this confusion by citing Sen et al (Curr. Opin. Oncol. (2000) 
12: 82-88) in the first Office Action, where the Examiner stated that: 

Cancerous tissue is known to be aneuploid, that is, having an abnormal number of 
chromosomes (see Sen, 2000, Curr. Opin. Oncol. 12: 82-88). The data presented 
in the specification were not corrected for aneuploidy. A slight amplification of a 



1 Citations to "Office Action" refer to the final Office Action mailed March 30, 2005, 
unless indicated otherwise. 
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gene does not necessarily mean overexpression in a tissue, but can merely be an 
indication that the cancer tissue is aneuploid. First Office Action at 6-7. 

Whether or not gene amplification leads to increased gene expression is completely 
irrelevant to the instant application. Likewise, whether the differential mRNA expression of the 
PRO 1277 gene reported in Example 18 is due to an increase or decrease in copy number, or 
alternatively due to an increase or decrease in transcription rates is simply not relevant. 
Appellants have provided reliable evidence that the PRO 1277 mRNA is differentially expressed 
in certain tumors by examining cDNA libraries, not genomic DNA. Whether this differential 
mRNA expression is due to aneuploidy, changes in gene copy number, transcription rates, a 
combination of the these, or some other known or unknown cellular mechanism is simply not 
relevant to Appellants' asserted utility. Regardless of the cause, the differential expression of 
PRO 1277 mRNA and the resulting differential expression of PRO 1277 protein can be used as a 
molecular marker of esophageal and skin cancer to assist in the diagnosis of this disease. 

In response to the above arguments, the Examiner has stated that "[t]hese arguments have 
been fully considered but are found to be partially persuasive because the Office accepts that 
aneuploidy has no relevance to the differential expression of the cDNA of the instant invention." 
Office Action at 15 (emphasis added). However, the Examiner then goes on to state that 
Appellants' assertion that the Examiner is confusing the difference between increased copy 
number, i.e. gene amplification, on the one hand, and increased gene expression, i.e. increased 
mRNA levels, on the other was "fully considered but not found to be persuasive." Office Action 
at 16. The Examiner then cites Pennica et al. as discussed above. 

The issues before the Board will be greatly simplified if an understanding can be reached 
that the data in Example 18 reflect the level of mRNA for PRO 1277 expressed in the tissues 
tested, i.e., the level of PRO 1277 gene expression, and not the number of copies of the PRO 1277 
gene present in the genome of the tissue tested, i.e., gene amplification. Once this is established, 
it is clear that the Sen and Pennica references are irrelevant, and Appellants' references and 
declarations regarding the relationship between mRNA levels and protein levels are of 
unquestionable relevance. 
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6. The Examiner's Arguments 

In the first Office Action, dated August 3, 2004, the Examiner rejected the pending 
claims, stating "Claims 1-6 are rejected under 35 U.S.C. 101 because the claimed invention is 
not supported by either a specific, substantial and credible asserted utility or a well established 
utility." First Office Action at 3. This rejection is maintained in the final Office Action. Office 
Action at 3. 

To establish a prima facie showing that the claimed subject matter lacks utility, the 
Examiner must "provide[] evidence showing that one of ordinary skill in the art would 
reasonably doubt the asserted utility." In re Brana, 51 F.3d 1560, 1566, 34 U.S.P.Q.2d 1436 
(Fed. Cir. 1995). The Examiner has issued a first Office Action, a final Office Action, and an 
Advisory Action during the prosecution of the instant application. None of these papers provide 
any evidence that one of ordinary skill in the art would reasonably doubt the asserted utility. 

As discussed above, the Examiner has made a number of irrelevant arguments regarding 
an asserted lack of correlation between gene amplification and an increase in gene expression, as 
well as the role of aneuploidy in cancer, citing the Sen and Pennica et al references. These 
arguments are irrelevant for the reasons discussed above, and will not be addressed in any more 
detail below. The Examiner's remaining arguments are summarized below. 

The Examiner states that the "polynucleotide (cDNA)" for PRO 1277 is disclosed to be 
more highly expressed in normal esophagus and skin compared to the esophageal and melanoma 
tumor based on the PCR amplification of cDNA libraries, and that the use of the molecule for 
diagnosis is also disclosed. However, the Examiner rejects this utility, stating that "[t]here is no 
supporting evidence to indicate that the polypeptide encoded by the polynucleotide of the instant 
invention is more highly expressed in normal tissues compared to their tumor tissue counterparts, 
and as such one of skilled in the art would conclude that it is not supported by a substantial 
asserted utility or a well-established utility." First Office Action at 5. The Examiner raises 
essentially two arguments to reject the asserted utility. 

First, the Examiner challenges the sufficiency of the data presented in Example 18. The 
Examiner argues that the evidence of differential expression of the PRO 1277 mRNA in 
esophageal and melanoma tumors is insufficient because it does not teach what the normal level 
of expression is, it does not indicate how high the expression level is compared to the disease 
tissue, it lacks statistical correlation, there is no data to compare expression in the normal and 
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disease samples, and that because the normal and tumor samples are not from the same person, 
there is no possibility of direct comparison between the normal and tumor samples. See First 
Office Action at 5-6. The Examiner also cites Hu et al (J. Proteome Res., (2003) 2(4):405-12) to 
support for his assertion the literature cautions researchers from drawing conclusions based on 
small changes in transcript expression levels between normal and cancerous tissue. Final Office 
Action at 5, 13, 14, and 16. 

Second, the Examiner argues that "polypeptide levels cannot be accurately predicted 
from mRNA levels," citing Haynes et al (Electrophoresis, (1998) 19(1 1):1862-71) and Chen et 
al (Mol. and Cell. Proteomics, (2002) 1:304-313) for support. Office Action at 4, 14, and 16. 

Based on these arguments, the Examiner concludes that "[fjurther research needs to be 
done to determine whether the decrease in PRO 1277 cDNA expression compared to normal 
esophagus or skin tissues supports a role for the peptide in the cancerous tissue; such a role has 
not been suggested by the instant disclosure." Office Action at 5-6. The Examiner states that this 
further research requirement makes clear that the asserted utility is not substantial, and therefore 
the Appellants 5 invention is not complete. Id, at 6. 

7. The Examiner has not established a Prima Facie case that Claims 7-5 lack 
Utility 

The above arguments do not satisfy the Examiner's burden to "provide[] evidence 
showing that one of ordinary skill in the art would reasonably doubt the asserted utility." In re 
Brana, 51 F.3d 1560, 1566, 34 U.S.P.Q.2d 1436 (Fed. Cir. 1995). The Examiner has the burden 
of presenting "countervailing facts and reasoning sufficient to establish that a person of ordinary 
skill would not believe the Appellant's assertion of utility." M.P.E.P. at §2107.02 III.A., citing 
in re Brana, 51 F.3d 1560, 1566, 34 U.S.P.Q.2d 1436 (Fed. Cir. 1995) ("Only after the PTO 
provides evidence showing that one of ordinary skill in the art would reasonably doubt the 
asserted utility does the burden shift to the Appellant to provide rebuttal evidence") (emphasis 
added). With the exception of the Hu et al, Haynes et al, and Chen et al reference, the 
Examiner's assertions are not supported by any facts, evidence, or reasoning. Instead, the 
Examiner simply makes unsupported assertions. As for the three references cited by the 
Examiner, for the reasons discussed below, they do not support the Examiner's position. 
Therefore there is simply no evidence in the record to support the Examiner's assertion that 
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Appellants' asserted utility is not substantial, and the invention is incomplete. Absent some 
substantial evidence to support his assertions, the Examiner has failed to establish a prima facie 
showing that one of skill in the art would reasonably doubt the asserted utility, and the Board 
should accept Appellants' disclosed utility as sufficient. 

a. The data in Example 18 are sufficient to establish the asserted utility 
Appellants turn first to the Examiner's arguments challenging the reliability of the data 
reported in Example 18. The Examiner argues that Example 18 and the first declaration of Mr. 
Grimaldi are insufficient to overcome the utility rejection of the pending claims because the 
specification does not teach what the normal level of expression is, it does not indicate how high 
the expression level is compared to the disease tissue, it lacks statistical correlation, there is no 
data to compare expression in the normal and disease samples, and that because the normal and 
tumor samples are not from the same person, there is no possibility of direct comparison between 
the normal and tumor samples. See First Office Action at 5-6. The Examiner also argues that the 
literature cautions researchers from drawing conclusions based on small changes in transcript 
expression levels between normal and cancerous tissue, citing Hu et al (J. Proteome Res., (2003) 
2(4):405-12). Final Office Action at 5, 13, 14, and 16. None of these unsupported arguments are 
sufficient to establish a prima facie case that one of skill in the art would reasonably doubt the 
asserted utility. 

As an initial matter, Appellants note that the only objection by the Examiner to the data 
in Example 18 that is supported by any reasoning or evidence is the assertion that the literature 
cautions researchers from drawing conclusions based on small changes in transcript expression 
levels between normal and cancerous tissue. The remainder of the objections are not supported 
by any evidence or reasoning as to why this makes the data in Example 1 8 insufficient, and 
therefore they cannot establish a prima facie case. See In re Brana, 51 F.3d 1560, 1566, 34 
U.S.P.Q.2d 1436 (Fed. Cir. 1995) ("Only after the PTO provides evidence showing that one of 
ordinary skill in the art would reasonably doubt the asserted utility does the burden shift to the 
Appellant to provide rebuttal evidence.") (emphasis added). Despite this deficiency, Appellants 
address the Examiner's objections below. 

The gene expression data in Example 18 of the specification show that the mRNA 
associated with protein PRO 1277 was more highly expressed in normal esophageal and skin 
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tissue compared to esophageal and melanoma tumor, respectively. See Specification at K [0530] 
and accompanying tables. Gene expression was analyzed using standard quantitative PCR 
amplification reactions of cDNA libraries isolated from different human tumor and normal 
human tissue samples. Id It is well known in the art that the number of copies of a particular 
cDNA in the cDNA library is determined by the number of copies of the corresponding mRNA 
in the sample. Therefore, the cDNA libraries can be used to determine the level of expression of 
the corresponding mRNA in the tissue. 

Appellants have asserted that identification of the differential expression of the PRO 1277 
polypeptide-encoding gene in tumor tissue compared to the corresponding normal tissue renders 
the molecule useful as a diagnostic tool for the determination of the presence or absence of tumor. 
Id In support of this asserted utility, Appellants submitted as Exhibit 2 to their Amendment and 
Response to Office Action a first Declaration of J. Christopher Grimaldi, an expert in the field of 
cancer biology. This declaration explains the importance of the data in Example 18, and how 
differential gene and protein expression studies are used to differentiate between normal and 
tumor tissue. See First Grimaldi Declaration. 

In paragraphs 6 and 7, Mr. Grimaldi explains that the semi-quantitative analysis 
employed to generate the data of Example 18 is sufficient to determine if a gene is over- or 
under-expressed in tumor cells compared to corresponding normal tissue. He states that any 
visually detectable difference seen between two samples is indicative of at least a two-fold 
difference in cDNA between the tumor tissue and the counterpart normal tissue. He also states 
that the results of the gene expression studies indicate that the genes of interest "can be used to 
differentiate tumor from normal. 55 He explains that, contrary to the PTO's assertions, "[t]he 
precise levels of gene expression are irrelevant; what matters is that there is a relative difference 
in expression between normal tissue and tumor tissue. 55 First Grimaldi Declaration at % 7. 

This declaration makes clear that since it is the relative level of expression between 
normal tissue and suspected cancerous tissue that is important, how high the level of expression 
in normal tissue is, is irrelevant. As to the Examiner's questions about the reliability and 
reproducibility of the results, Appellants employed standard techniques which are well-known 
and accepted by those of skill in the art. Mr. Grimaldi states that if a difference is detected using 
these techniques, "this indicates that the gene and its corresponding polypeptide and antibodies 
against the polypeptide are useful for diagnostic purposes... 55 Id Thus, it is the uncontested 
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opinion of an expert in the field that the results are reliable enough to indicate that the claimed 
antibodies are useful as diagnostic tools. As to the Examiner's concerns regarding the number 
and types of samples used, Mr. Grimaldi states that the samples are pooled samples of normal 
and tumor tissue, and therefore are more reliable than individual samples. Id. at If 5. 

The Examiner has also rejected the data because he questions the statistical significance 
of the data. However, Appellants are not required to prove utility to a statistical certainty, only 
that it is more likely than not true. See Nelson v. Bowler, 626 F.2d 853, 856-57, 206 U.S.P.Q. 
881, 883-84 (C.C.P.A. 1980) (reversing the Board and rejecting an argument that evidence of 
utility was insufficient because it was not statistically significant). Therefore, whether the results 
are statistically significant or not is irrelevant to establishing the asserted utility. 

The Examiner has also cited Hu et al (J. Proteome Res., (2003) 2(4):405-12) for support 
for its assertion the literature cautions researchers from drawing conclusions based on small 
changes in transcript expression levels between normal and cancerous tissue. The PTO states 
that Hu teaches that not all genes with increased expression in cancer have a known or published 
role in cancer. 

In Hu, the researchers used an automated literature-mining tool to summarize and 
estimate the relative strengths of all human gene-disease relationships published on Medline. 
They then generated a microarray expression dataset comparing breast cancer and normal breast 
tissue. Using their data-mining tool, they looked for a correlation between the strength of the 
literature association between the gene and breast cancer, and the magnitude of the difference in 
expression level. They report that for genes displaying a 5-fold change or less in tumors 
compared to normal, there was no evidence of a correlation between altered gene expression and 
a known role in the disease. See Hu at 411. However, among genes with a 10-fold or more 
change in expression level, there was a strong correlation between expression level and a 
published role in the disease. Id. at 412. Importantly, Hu reports that the observed correlation 
was only found among estrogen receptor-positive tumors, not ER-negative tumors. Id. 

The general findings of Hu are not surprising - one would expect that genes with the 
greatest change in expression in a disease would be the first targets of research, and therefore 
have the strongest known relationship to the disease as measured by the number of publications 
reporting a connection with the disease. The correlation reported in Hu only indicates that the 
greater the change in expression level, the more likely it is that there is a published or known role 
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for the gene in the disease, as found by their automated literature-mining software. Thus, Hu's 
results merely reflect a bias in the literature toward studying the most prominent targets, and 
reflect nothing regarding the ability of a gene that is 2-fold or more differentially expressed in 
tumors to serve as a disease marker. 

Hu acknowledges the shortcomings of this method in explaining the disparity in Hu's 
findings for ER-negative versus ER-positive tumors: Hu attributes the "bias in the literature" 
toward the more prevalent ER-positive tumors as the explanation for the lack of any correlation 
between number of publications and gene expression levels in less-prevalent (and, therefore, less 
studied) ER-negative tumors. Id Because of this intrinsic bias, Hu's methodology is unlikely to 
ever note a correlation of a disease with less differentially-expressed genes and their 
corresponding proteins, regardless of whether or not an actual relationship between the disease 
and less differentially-expressed genes exists. Accordingly, Hu's methodology yields results that 
provide little or no information regarding biological significance of genes with less than 5-fold 
expression change in disease. Nowhere in Hu does it say that a lack of correlation in their study 
means that genes with a less than five-fold change in level of expression in cancer cannot serve 
as a molecular marker of cancer. 

Appellants submit that a lack of known role for PRO 1277 in cancer does not prevent its 
use as a diagnostic tool for cancer. There is a difference between use of a gene for distinguishing 
between tumor and normal tissue on the one hand, and establishing a role for the gene in cancer 
on the other. Genes with lower levels of change in expression may or may not be the most 
important genes in causing the disease, but the genes can still show a consistent and measurable 
change in expression. While such genes may or may not be good targets for further research, 
they can nonetheless be used as diagnostic tools. Thus, Hu does not refute the Appellants' 
assertion that the PRO 1277 gene can be used as a cancer diagnostic tool because it is 
differentially expressed in certain tumors. 

Contrary to the Examiner's assertion that one must know what role a gene or polypeptide 
plays in cancer for it have utility, the PTO's own written policies recognize that the utility of a 
nucleic acid does not depend on the function of the encoded gene product. The Utility 
Examination Guidelines published on January 5, 2001 state: "In addition, the utility of a claimed 
DNA does not necessarily depend on the function of the encoded gene product. A claimed DNA 
may have a specific and substantial utility because, e.g. it hybridizes near a disease-associated 

- 16- 



Appl. No. 
Filed 



10/063,540 
May 2, 2002 



gene or it has a gene regulating activity. 55 (Federal Register, Volume 66, page 1095, Comment 
14). Similarly, here the disclosed nucleic acids, as well as the encoded polypeptides and related 
antibodies, are useful for determining whether an individual has cancer regardless of whether or 
not they are the cause of the cancer. 

The position of the Examiner requiring a known role for PRO 1277 in cancer for utility is 
also inconsistent with the analogous standard for therapeutic utility of a compound where "the 
mere identification of a pharmacological activity of a compound that is relevant to an asserted 
pharmacological use provides an 'immediate benefit to the public 5 and thus satisfies the utility 
requirement. 55 M.P.E.P. §2701.01 (emphasis original). Here, the mere identification of altered 
expression in tumors is relevant to diagnosis of tumors, and, therefore, provides an immediate 
benefit to the public. 

The data in Example 18 and the first Grimaldi Declaration are therefore sufficient to 
establish the asserted utility, and the Examiner has not rebutted the presumption of utility that the 
Appellants 5 application is afforded. Mr. Grimaldi is an expert in the field who conducted or 
supervised the experiments at issue. His declaration is based on personal knowledge of the 
relevant facts at issue. Appellants 5 have reminded the Examiner that "Office personnel must 
accept an opinion from a qualified expert that is based upon relevant facts whose accuracy is not 
being questioned. 55 M.P.E.P. § 2107 (emphasis added). In addition, declarations relating to 
issues of fact should not be summarily dismissed as "opinions 55 without an adequate explanation 
of how the declaration fails to rebut the Examiner's position. See in re Alton 76 F.3d 1 168 (Fed. 
Cir. 1996). The Examiner has offered no reason or evidence to reject either the underlying data 
or Mr. Grimaldi's conclusions. Therefore, the Examiner should accept Mr. Grimaldi's opinion 
with regard to his statement that "any visually detectable difference seen between two samples is 
indicative of at least a two-fold difference in cDNA between the tumor tissue and the counterpart 
normal tissue 55 and that the genes of interest "can be used to differentiate tumor from normal. 55 

In conclusion, Appellants submit that the evidence reported in Example 18, supported by 
the first Grimaldi Declaration, establish that there is at least a two-fold difference in PRO 1277 
mRNA between esophageal and melanoma tumor tissue and normal esophageal and skin tissue, 
respectively. Therefore, it follows that the PRO 1277 gene, polypeptide, and antibody can be 
used to distinguish esophageal and melanoma tumor tissue from their normal tissue counterparts. 
The Examiner has not offered any significant arguments or evidence to the contrary, and 
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therefore has not established a prima facie case that one of skill in the art would reasonably 
doubt the asserted utility. 

b. The two references cited by the Examiner do not refute Appellants 9 
assertion that a change in mRNA levels leads to a corresponding change 
in the level of the encoded protein 

Appellants turn next to Examiner's second argument that "polypeptide levels cannot be 
accurately predicted from mRNA levels." Office Action at 4, 14, and 16. For support, the 
Examiner cites two references, Haynes et al (Electrophoresis, (1998) 19(1 1): 1862-71) and Chen 
et al (Mol. and Cell. Proteomics, (2002) 1:304-313). Based on these references, the Examiner 
concludes that "it is clear that one skilled in the art would not assume that a higher expression of 
mRNA would correlate with increased polypeptide levels." Office Action at 5. For the reasons 
discussed below, neither of these references are contrary to Appellants' assertion that generally 
speaking, changes in mRNA levels lead to corresponding changes in the level of polypeptide. 

Haynes studied whether there is a correlation between the level of mRNA expression and 
the level of protein expression for 80 selected genes from yeast. The genes were selected 
because they constituted a relatively homogeneous group with respect to predicted half-life and 
expression level of the protein products. See Haynes at 1863. Haynes did not examine whether 
a change in transcript level for a particular gene led to a change in the level of expression of the 
corresponding protein. Instead, Haynes determined whether the steady-state transcript level 
correlated with the steady-state level of the corresponding protein based on an analysis of 80 
different genes. 

Haynes reported to have "found a general trend but no strong correlation between protein 
and transcript levels." Id. However, a cursory inspection of Fig. 1 shows a clear correlation 
between the mRNA levels and protein levels measured. This correlation is confirmed by an 
inspection of the full-length research paper from which the data in Fig. 1 were derived, (Gygi et 
al, Molecular and Cellular Biology, Mar. 1999, 1720-1730), submitted as Exhibit 5 with 
Appellants' Amendment and Response to Final Office Action. Gygi states that "there was a 
general trend of increased protein levels resulting from increased mRNA levels," with a 
correlation coefficient of 0.935, indicating a strong correlation. Gygi at 1726. Moreover, Gygi 
also states that the correlation is especially strong for highly expressed mRNAs. Id. Thus, it is 
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not clear that Haynes even supports the Examiner's position, as Haynes did report a general trend, 
and Gygi reports a strong correlation between increasing mRNA levels and increasing protein 
levels. 

The PTO focuses on the portion of Haynes where the authors reported that for some of 
the studied genes with equivalent mRNA levels, there were differences in corresponding protein 
expression, including some that varied by more than 50-fold. Similarly, Haynes reports that 
different proteins with similar expression levels were maintained by transcript levels that varied 
by as much as 40-fold. Office Action at 4-5. Thus, Haynes showed that for one type of yeast, 
similar mRNA levels for different genes did not universally result in equivalent protein levels for 
the different gene products, and similar protein levels for different gene products did not 
universally result from equivalent mRNA levels for the different genes. These results are 
expected, since there are many factors that determine translation efficiency for a given transcript, 
or the half-life of the encoded protein. Not surprisingly, based on these results, Haynes 
concluded that protein levels cannot always be accurately predicted from the level of the 
corresponding mRNA transcript when looking at the level of transcripts across different genes . 

Importantly, Haynes did not say that for a single gene, the level of mRNA transcript is 
not positively correlated with the level of protein expression. Appellants have asserted that 
increasing or decreasing the level of mRNA for the same gene leads to a increase or decrease for 
the corresponding protein. Haynes did not study this issue and says absolutely nothing about it. 
One cannot look at the level of mRNA across several different genes to investigate whether a 
change in the level of mRNA a particular gene leads to a change in the level of protein for that 
gene. Therefore, Haynes is not inconsistent with or contradictory to the utility of the instant 
claims, and offers no support for the Examiner's rejection of Appellants' asserted utility. 

Appellants turn next to the Chen et al reference, where the authors examined the 
relationship between mRNA levels and protein levels in 76 lung adenocarcinomas and nine non- 
tumor lung samples. 

As an initial matter, it is important to note that a portion of Chen is clearly not relevant to 
Appellants' assertion that changes in the level of mRNA lead to changes in the level of the 
encoded polypeptide. In one experiment which was similar to that of Haynes, Chen examined 
the global relationship between mRNA and the corresponding protein abundance by calculating 
the average mRNA and protein level of all the samples for each gene or protein, and then looked 
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for a correlation across different genes. Based on these data, Chen reported that "no significant 
correlation between mRNA and protein expression was found (r = -0.025) if the average levels of 
mRNA or protein among all samples were applied across the 165 protein spots (98 genes)." 
Chen at Abstract. This measurement of a correlation across different genes is not relevant to 
Appellants' asserted utility for the same reasons discussed above with respect to the Haynes et al 
reference, and is apparently not relied on by the Examiner. 

Chen also looked at the level of mRNA of 98 individual genes and their corresponding 
proteins across the samples. Chen reports that 17% (28 of 165) of the protein spots, or 21.4% 
(21 of 98) of the genes, showed a statistically significant correlation between protein and mRNA 
expression. Chen at Abstract. It is these results that the Examiner relies on for support. 

However, read in its entirety, Chen provides scant evidence to counter Appellants' 
asserted utility because portions of Chen support Appellants' assertions, and the remaining 
portions provide little insight into the relationship between changes in mRNA levels and changes 
in the corresponding protein levels for mRNA that is differentially expressed in tumor cells 
relative to normal cells. 

Appellants have asserted that changes in mRNA levels, particularly those which are two- 
fold or greater, will correspond with measurable changes in polypeptide expression. The data in 
Chen support Appellants' assertion. In Figures 2A-2C, Chen plots mRNA value vs. protein 
value for three genes. In these figures, a wide range of mRNA expression levels were observed 
(approximately seven- to eight-fold), and a correlation between mRNA and protein levels was 
observed for all three mRNA/protein pairs. This supports Appellants' assertion that there is a 
correlation between changes in mRNA levels which are two-fold or greater and changes in 
polypeptide expression. 

The Examiner relies on the fact that Chen also reports a lack of correlation for some 
mRNA/protein pairs to support his assertion that polypeptide levels cannot be accurately 
predicted from mRNA levels. However, as is explained below, the apparent lack of a correlation 
cannot be used as evidence that Appellants' assertion of a general correlation is wrong. 

To determine if there is a correlation between changes in mRNA and changes in protein 
levels, one would have to conduct experiments where a measurable change in mRNA for a 
particular gene is observed, and then examine if there was a corresponding change in the level of 
the corresponding protein. Stated differently, if there is no substantial change in mRNA levels 
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for a particular gene, one cannot measure a correlation between changes in mRNA and changes 
in the encoded protein for that gene. Therefore, one must know if the individual genes studied 
by Chen were differentially expressed to know if the observed lack of correlation has any 
relevance to Appellants' assertions of a general correlation between changes in mRNA and 
protein. 

Importantly, unlike Appellants, Chen did not examine differences in mRNA between 
tumor and normal tissue where one would expect to find substantial changes in the level of 
mRNA for certain genes. Instead, Chen merely selected proteins whose identity could be 
determined regardless of any changes in expression level . Chen at 306, right column. Therefore, 
it is not known if there was any substantial difference in mRNA levels for the various studied 
genes across samples - in short, with the exception of the genes in Figures 2A-2C, it is not 
known if the genes examined were differentially expressed. Also of significance for Appellants' 
asserted utility is the fact that Chen did not attempt to examine any differential expression 
between the cancerous lung samples and the non-cancerous lung samples - Chen did not 
distinguish between cancer and normal samples in their analysis. Since almost all samples tested 
by Chen were from the same type of tissue, one would expect most genes examined by Chen to 
have similar mRNA or protein levels across the samples. In the absence of substantial 
differential expression, no correlation would be observed. Because it is not known if there was a 
change in the level of the genes studied by Chen, i.e. whether they were differentially expressed, 
the lack of an observed correlation cannot be used to counter Appellants' assertion. 

In sum, the only data reported by Chen which shows substantial changes in the 
expression of mRNA, Figures 2A-C, confirms Appellants' assertion that substantial changes in 
mRNA levels (e.g., 2-fold or greater) will correspond to substantial changes in polypeptide 
expression. Further, these data explain the lack of observed correlation between mRNA levels 
and protein levels for other genes reported by Chen - there is no indication the genes are 
differentially expressed. Thus, Chen's results do not refute Appellants' position. Instead, Chen 
supports Appellants' position that a significant correlation between changes in mRNA and 
protein levels exists for changes in mRNA levels that are 2-fold or greater. 

In further support of Appellants' position, Chen cites Celis et al (FEBS Lett., 480:2-16 
(2000)) stating that the authors "found a good correlation between transcript and protein levels 
among 40 well resolved, abundant proteins using a proteomic and microarray study of bladder 
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cancer." Chen at 311, first column (emphasis added). As mentioned above, the lack of a 
correlation across genes is not relevant to Appellants' asserted utility, and therefore Chen's 
discussion of this issue and citation of Anderson and Seilhamer (Electrophoresis, 18:533-37 
(1997)) and Gygi et al (Mol. Cell. Bio., 19:1720-30 (1999)) offer no support for the Examiner's 
position. 

Given the fact that portions of Chen as well as the relevant references cited by Chen 
support Appellants' position, and the remainder of Chen cannot be relied on as contrary to the 
Appellants' position, the Examiner has failed to establish a prima facie case that one of skill in 
the art would doubt Appellants' asserted utility based on any lack of correlation between changes 
in mRNA level and changes in the corresponding protein level. 

c. Conclusion - Examiner has failed to establish a prima facie case that 
one of skill in the art would doubt Appellants 9 asserted utility 
The Examiner has relied on essentially two arguments in rejecting the pending claims for 
lack of utility. First, the Examiner has questioned the sufficiency, reliability and significance of 
the data reported in Example 18 as well as the supporting first Grimaldi declaration. The 
Examiner has argued that absent some known translocation or mutation of PRO 1277, or some 
role for PRO 1277 in cancer formation or development, the disclosure is insufficient. Second, the 
Examiner relies on Haynes et al and Chen et al to support the assertion that polypeptide levels 
cannot be accurately predicted from mRNA levels. Appellants have responded to each of these 
arguments in turn. 

First, Appellants have shown that the data in Example 18 are sufficient to show that 
PRO 1277 is useful as a cancer diagnostic tool. This assertion is supported by the first Grimaldi 
declaration. The Examiner has not provided any substantial reason or evidence for one of skill in 
the art to doubt the reliability or usefulness of Example 18, or the facts and conclusions in the 
first Grimaldi declaration. 

Second, Appellants have shown the Haynes references is simply not relevant to the issue 
of whether a change in mRNA levels leads to a corresponding change in the level of the encoded 
protein. Appellants have also shown that portions of Chen et al, as well as some of the 
references cited by Chen, actually support Appellants assertion that changes in mRNA levels 
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generally correlate with changes in the level of the encoded polypeptide. The remainder of Chen 
is not reliable enough to offer any support for the Examiner's position. 

Taken together, the Examiner's arguments are not sufficient to satisfy the Examiner's 
burden to "provide[] evidence showing that one of ordinary skill in the art would reasonably 
doubt the asserted utility." In re Brana, 51 F.3d 1560, 1566, 34 U.S.P.Q.2d 1436 (Fed. Cir. 
1995). The Examiner's arguments are largely conclusory statements which are not supported by 
any substantial evidence or reasoning which explains why one of ordinary skill in the art would 
reasonably doubt the asserted utility. Therefore, the Board should accept the Appellants' 
disclosure of utility. See Ex parte Rubin, 5 U.S.P.Q. 2d 1461 (Bd. Pat. App. & Interf. 1987) 
("There is no factual support in this record for the examiner's questioning of the denaturation test 
reported in the specification. ... No reason to doubt 'the objective truth' of the asserted utility 
having been advanced by the examiner, we accept appellant's disclosure of utility corresponding 
in scope to the claimed subject matter."). 

8. Appellants have provided Sufficient Rebuttal Evidence of Utility 

"Only after the PTO provides evidence showing that one of ordinary skill in the art would 

reasonably doubt the asserted utility does the burden shift to the applicant to provide rebuttal 

evidence." In re Brana, 51 F.3d 1560, 1566, 34 U.S.P.Q.2d 1436 (Fed. Cir. 1995). The rebuttal 

evidence must be sufficient such that when it is considered as a whole, it is more likely than not 

that the asserted utility is true. See In re Oetiker, 977 F.2d 1443, 1445, 24 U.S.P.Q.2d 1443, 

1444 (Fed. Cir. 1992) (stating that the evidentiary standard to be used throughout ex parte 

examination in setting forth a rejection is a preponderance of the evidence, or "more likely than 

not" standard). The M.P.E.P. summarizes the standard of proof required: 

[T]he applicant does not have to provide evidence sufficient to establish that an 
asserted utility is true "beyond a reasonable doubt." Nor must the applicant 
provide evidence such that it establishes an asserted utility as a matter of 
statistical certainty. Instead, evidence will be sufficient if, considered as a whole, 
it leads a person of ordinary skill in the art to conclude that the asserted utility is 
more likely than not true . M.P.E.P. § 2107.02, part VII (emphasis in original, 
citations omitted). 

Appellants remind the Board that the Federal Circuit has stated that the standard for satisfying 
the utility requirement is a low one: "The threshold of utility is not high: An invention is 'useful' 
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under section 101 if it is capable of providing some identifiable benefit." Juicy Whip, Inc. v. 
Orange Bang, Inc., 185 F.3d 1364, 1366, 51 U.S.P.Q. 2d 1700 (Fed. Cir. 1999). 

Even if the Examiner has satisfied his burden of presenting a prima facie case of lack of 
utility, Appellants have supplied more than enough rebuttal evidence, such that when considered 
as a whole, one of skill in the art would conclude that the asserted utility is more likely than not 
true. As discussed in detail below, Appellants have provided sufficient evidence that the gene 
encoding the PRO 1277 polypeptide is differentially expressed in certain cancers and can 
therefore be used as a diagnostic tool. In addition, Appellants have shown that it is well 
established in the art that there is a reasonable correlation between changes in mRNA level and 
changes in the corresponding protein level such that one of skill in the art would believe that the 
PRO 1277 polypeptide is also differentially expressed in certain cancers. Therefore, considering 
the evidence as a whole, one of skill in the art would believe that it is more likely than not that 
the claimed antibodies are useful as diagnostic tools for cancer, particularly esophageal and 
melanoma tumors. 

a. Appellants have established that the sene encodin2 the PRO 1277 
polypeptide is differentially expressed in certain cancers 

As discussed above, the Examiner has not provided any evidence or reasoning to 
challenge the reliability and significance of the data in Example 1 8 which reports that the mRNA 
for PRO 1277 is more highly expressed in normal esophageal and skin tissue compared to 
esophageal and melanoma tumor, respectively. In contrast to this complete lack of evidence on 
the part of the Examiner, Appellants have submitted the first Grimaldi declaration. That 
declaration establishes that it is the opinion of an expert in the field who has personal knowledge 
of the facts surrounding Example 18 that there is at least a two-fold difference in mRNA for 
PRO 1277 between the tumor tissue and the counterpart normal tissue, and that the PRO 1277 
genes, polypeptides and antibodies are useful for differentiating tumor tissue from normal tissue. 
The Examiner has not provided may evidence or reasoning to challenge the facts and conclusions 
of the first Grimaldi declaration in support of Example 18. 

Given the disclosure of Example 18 and the supporting first Grimaldi declaration on the 
one hand, and the complete lack of any evidence on the other, it is clear that considering the 
evidence as a whole, one of skill in the art would conclude that it is more likely than not that the 
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PRO 1277 gene is differentially expressed in esophageal and melanoma tumor tissue compared to 
their normal tissue counterparts such that is useful as a diagnostic tool to distinguish tumor tissue 
from normal tissue. 

As Appellants explain below, it is more likely than not that the PRO 1277 polypeptide is 
also differentially expressed in esophageal and melanoma tumor tissue, and can therefore be used 
to distinguish tumor tissue from normal tissue. This provides utility for the claimed antibodies to 
the PRO 1277 polypeptide. 

b. Appellants have established that generally there is a correlation between 
changes in mRNA expression levels and changes in the expression level 
of the encoded protein 

Appellants next turn to the second portion of their argument in support of their asserted 
utility - that it is well-established in the art that in most cases a change in the level of mRNA for 
a particular protein leads to a corresponding change in the level of the encoded protein. Given 
Appellants' evidence of differential expression of the mRNA for the PRO 1277 polypeptide in 
esophageal and melanoma tumor, it is more likely than not that the PRO 1277 polypeptide is 
likewise differentially expressed, and therefore the claimed antibodies are useful as diagnostic 
tools, particularly for esophageal and melanoma tumor. 

In support of the assertion that changes in mRNA are positively correlated to changes in 
protein levels, Appellants submitted a second Declaration by J. Christopher Grimaldi, an expert 
in the field of cancer biology (originally submitted as Exhibit 6 with the Appellants' Amendment 
and Response to Office Action). As stated in paragraph 5 of the declaration, "Those who work 
in this field are well aware that in the vast majority of cases, when a gene is over-expressed... the 
gene product or polypeptide will also be over-expressed.... This same principal applies to gene 
under-expression." Second Grimaldi Declaration at \ 5. Further, "increased mRNA expression 
is expected to result in increased polypeptide expression, and the detection of decreased mRNA 
expression is expected to result in decreased polypeptide expression." Id. 

Appellants also submitted the declaration of Paul Polakis, Ph.D. an expert in the field of 

cancer biology (attached as Exhibit 7 to Appellants' Amendment and Response to Office Action). 

As stated in paragraph 6 of his declaration: 

Based on my own experience accumulated in more than 20 years of research, 
including the data discussed in paragraphs 4 and 5 above [showing a positive 
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correlation between mRNA levels and encoded protein levels in the vast majority 
of cases studied in relation to the present invention] and my knowledge of the 
relevant scientific literature, it is my considered scientific opinion that for human 
genes, an increased level of mRNA in a tumor cell relative to a normal cell 
typically correlates to a similar increase in abundance of the encoded protein in 
the tumor cell relative to the normal cell. In fact, it remains a central dogma in 
molecular biology that increased mRNA levels are predictive of corresponding 
increased levels of the encoded proteia Polakis Declaration at U 6 (emphasis 
added). 

Dr. Polakis acknowledges that there are published cases where such a correlation does not exist, 
but states that it is his opinion, based on over 20 years of scientific research, that "such reports 
are exceptions to the commonly understood general rule that increased mRNA levels are 
predictive of corresponding increased levels of the encoded protein." Polakis Declaration at 1 6. 

The statements of Grimaldi and Polakis are supported by the teachings in Molecular 
Biology of the Cell, a leading textbook in the field (Bruce Alberts, et aL, Molecular Biology of 
the Cell (3 rd ed. 1994) (submitted with Appellants' Amendment and Response to Final Office 
Action as Exhibit 6, hereinafter "Cell 3 rd ") and (4 th ed. 2002) (submitted with Appellants' 
Amendment and Response to Final Office Action as Exhibit 7, hereinafter "Cell 4 th ")). Figure 9- 
2 of Cell 3 rd shows the steps at which eukaryotic gene expression can be controlled. The first 
step depicted is transcriptional control. Cell 3 rd provides that "[f]or most genes transcriptional 
controls are paramount. This makes sense because, of all the possible control points illustrated 
in Figure 9-2, only transcriptional control ensures that no superfluous intermediates are 
synthesized." Cell 3 rd at 403 (emphasis added). In addition, the text states that "Although 
controls on the initiation of gene transcription are the predominant form of regulation for most 
genes , other controls can act later in the pathway from RNA to protein to modulate the amount of 
gene product that is made." Cell 3 rd at 453 (emphasis added). Thus, as established in Cell 3 rd , 
the predominant mechanism for regulating the amount of protein produced is by regulating 
transcription. 

In Cell 4 th , Figure 6-3 on page 302 illustrates the basic principle that there is a correlation 

between increased gene expression and increased protein expression. The accompanying text 

states that "a cell can change (or regulate) the expression of each of its genes according to the 

needs of the moment - most obviously by controlling the production of its mRNA" Cell 4 th at 

302 (emphasis added). Similarly, Figure 6-90 on page 364 of Cell 4 th illustrates the path from 

gene to protein. The accompanying text states that while potentially each step can be regulated 
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by the cell, " the initiation of transcription is the most common point for a cell to regulate the 
expression of each of its genes / 9 Cell 4 th at 364 (emphasis added). This point is repeated on 
page 379, where the authors state that of all the possible points for regulating protein expression, 
" [f|or most genes transcriptional controls are paramount ." Cell 4 th at 379 (emphasis added). 

Further support for Appellants' position can be found in the textbook, Genes VI, 
(Benjamin Lewin, Genes VI (1997)) (submitted with Appellants' After Final Amendment as 
Exhibit 8) which states "having acknowledged that control of gene expression can occur at 
multiple stages, and that production of RNA cannot inevitably be equated with production of 
protein, it is clear that the overwhelming majority of regulatory events occur at the initiation of 
transcription ." Genes VI at 847-848 (emphasis added). 

Additional support is also found in Zhigang et ai 9 World Journal of Surgical Oncology 
2:13, 2004 (submitted with Appellants' Amendment and Response to Final Office Action as 
Exhibit 9). Zhigang studied the expression of prostate stem cell antigen (PSCA) protein and 
mRNA to validate it as a potential molecular target for diagnosis and treatment of human 
prostate cancer. The data showed "a high degree of correlation between PSCA protein and 
mRNA expression" Zhigang at 4. Of the samples tested, 81 out of 87 showed a high degree of 
correlation between mRNA expression and protein expression. The authors conclude that "it is 
demonstrated that PSCA protein and mRNA overexpressed in human prostate cancer, and that 
the increased protein level of PSCA was resulted from the upregulated transcription of its 
mRNA." Id at 6. Even though the correlation between mRNA expression and protein 
expression occurred in 93% of the samples tested, not 100%, the authors state that "PSCA may 
be a promising molecular marker for the clinical prognosis of human Pea and a valuable target 
for diagnosis and therapy of this tumor." Id. at 7. 

Further, Meric et al, Molecular Cancer Therapeutics, vol. 1, 971-979 (2002), (submitted 
with Appellants' Amendment and Response to Final Office Action as Exhibit 10), states the 
following: 

The fundamental principle of molecular therapeutics in cancer is to exploit the 
differences in gene expression between cancer cells and normal cells... [M]ost 
efforts have concentrated on identifying differences in gene expression at the 
level of mRNA, which can be attributable to either DNA amplification or to 
differences in transcription. Meric et al at 971 (emphasis added). 
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Exploiting differences in gene expression between cancer cells and normal cells would not be a 
"fundamental principle" of molecular cancer therapeutics if there were no significant correlation 
between gene expression and protein levels. Stated another way, changes in mRNA without 
corresponding changes in protein levels would have little or no effect on cellular biology, and 
those of skill in the art would have no reason to examine the differences in gene expression at the 
mRNA level without such a correlation. However, as one of skill in the art recognizes, there is a 
strong correlation between changes in mRNA and changes in protein level. It is because of this 
strong correlation that it remains a "fundamental principle" of molecular therapeutics in cancer 
to look at changes in mRNA level. 

Together, the declarations of Grimaldi and Polakis, the accompanying references, and the 
excerpts and references discussed above all establish that the accepted understanding in the art is 
that there is a reasonable correlation between changes in gene expression and changes in the 
level of the encoded protein. In contrast to this substantial amount of evidence supporting 
Appellants' position, the Examiner has cited two references, Haynes et al and Chen et al 
However, as discussed above, Haynes is not relevant to the issue of whether a change in mRNA 
levels leads to a change in the level of the corresponding protein. Likewise, portions of Chen 
and the relevant references cited by Chen actually support A ppellants' position, and the 
remainder of Chen is inconclusive. It is clear that when considered as a whole, the 
preponderance of the evidence clearly weighs in favor of Appellants. 

Appellants have presented sufficient evidence to establish that the mRNA for PRO 1277 
is differentially expressed in esophageal and melanoma tumors compared to their normal tissue 
counterparts, and that it is more likely than not that this leads to differential expression of the 
PRO 1277 polypeptide. This makes the claimed antibodies to PRO 1277 polypeptide useful for 
diagnosing cancer, particularly esophageal and melanoma tumors. Given the overwhelming 
amount of evidence in support of Appellants' position, and the near absence of any evidence in 
support of the Examiner's position, when considered as a whole the evidence leads a person of 
ordinary skill in the art to conclude that the asserted utility is more likely than not true. 

c. The asserted utility is specific 
Finally, Appellants address the PTO's assertion that the asserted utilities are not specific 
to the claimed antibodies related to PRO 1277. 
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Specific Utility is defined as utility which is "specific to the subject matter claimed," in 
contrast to "a general utility that would be applicable to the broad class of the invention." 
M.P.E.P. § 2107.01 I. Appellants submit that the evidence of differential expression of the 
PRO 1277 gene and polypeptide in certain types of tumor cells, along with the declarations and 
references discussed above, provide a specific utility for the claimed antibodies. 

As discussed above, there are significant data which show that the gene for the PRO 1277 
polypeptide is expressed at least two-fold higher in normal esophageal and skin tissue compared 
to esophageal and melanoma tumor, respectively. These data are strong evidence that the 
PRO 1277 gene and polypeptide are associated with esophageal and melanoma tumors. Thus, 
contrary to the assertions of the Examiner, Appellants have provided evidence associating the 
PRO 1277 gene and polypeptide with a specific disease. The asserted utility for antibodies to the 
PRO 1277 polypeptide as a diagnostic tool for cancer, particularly esophageal and melanoma 
tumor, is a specific utility - it is not a general utility that would apply to the broad class of 
antibodies. 

9. The Examiner's Response to Appellants' Evidence is Insufficient to Rebut 

\ 

Appellants 9 Arguments 
The Examiner has stated that the Grimaldi and Polakis declarations are "insufficient to 
overcome the rejection of claims 1-5" based on 35 U.S.C. §§ 101 and 1 12. Office Action at 6. 

a. The Examiner's response to the First Grimaldi Declaration 

The Examiner has rejected the first Grimaldi Declaration based on two arguments. The 
first argument is that "there is no evidentiary art that would corroborate for example, that 'any 
visually detectable difference seen between two samples is indicative of at least a two-fold 
difference in cDNA between the tumor tissue and the counterpart normal tissue.'" Office Action 
at 13 (quoting first Grimaldi Declaration). 

Appellants submit that the declaration of Mr. Grimaldi is based on personal knowledge of 
the relevant facts at issue. Mr. Grimaldi is an expert in the field and conducted or supervised the 
experiments at issue. Appellants have reminded the Examiner that "[o]ffice personnel must 
accept an opinion from a qualified expert that is based upon relevant facts whose accuracy is not 
being questioned." PTO Utility Examination Guidelines (2001) (emphasis added). In addition, 
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declarations relating to issues of fact should not be summarily dismissed as "opinions" without 
an adequate explanation of how the declaration fails to rebut the Examiner's position. In re 
Alton 76 F.3d 1 168 (Fed. Cir. 1996). 

In addition, Appellants provided as Exhibit 1 to their Amendment and Response to Final 
Office Action a copy of page 122 of the 2002-2003 New England Biolabs catalog. This exhibit 
shows DNA size markers of differing lengths run on an agarose gel. The column on the left 
provides the mass of each marker in nanograms and the column on the right provides the length 
of the marker. It is apparent that the band intensity of markers having mass differences of two- 
fold are readily distinguishable by eye {see, e.g., the difference in band intensities of the 0.1 kb 
fragment present at 61ng and the 0.5kb marker present at 124ng). Accordingly, Appellants 
maintain that the procedures used to detect differences in expression levels were sufficiently 
sensitive to detect two-fold differences. 

The Examiner has not supplied any reasons or evidence to question the accuracy of the 
facts upon which Mr. Grimaldi based his opinions. Mr. Grimaldi has personal knowledge of the 
relevant facts, has based his opinion on those facts, and the Examiner has offered no reason or 
evidence to reject either the underlying facts or his opinion. Therefore, the Examiner and Board 
should accept Mr. Grimaldi 5 s opinion with regard to his statement that "any visually detectable 
difference seen between two samples is indicative of at least a two-fold difference in cDNA 
between the tumor tissue and the counterpart normal tissue" and that the nucleic acids of interest 
"can be used to differentiate tumor from normal." Together, these statements establish that there 
is at least a two-fold difference in expression, and that the results are reliable enough that they 
can be used to distinguish tumor from normal tissue. 

The Examiner's second argument in response to the first Grimaldi declaration is that "one 
cannot determine from the data in the specification whether the observed 'amplification' of 
nucleic acid is due to increase in copy number, or alternatively due to increase in transcription 
rates," and that the specification does not provide any information regarding "differential mRNA 
levels of PR01277," but rather "describes only gene amplification data." Office Action at 12. 

Appellants again emphasize that the data in Example 18 are gene expression data, not 
gene amplification data. The specification and the first Grimaldi Declaration make clear that 
Example 18 used semi-quantitative PCR of cDNA libraries. Therefore, one of skill in the art 
would know that Example 18 is a measure of mRNA levels, and reflects differential PRO 1277 
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gene expression, not gene amplification. Therefore, the Examiner's arguments regarding gene 
amplification do not support the Examiner's challenge of the sufficiency of the Example 18 data, 
or the first Grimaldi Declaration. 

b. The Examiner's response to the Second Grimaldi Declaration 

In response to the second Grimaldi Declaration, the Examiner focuses on paragraph 4 of 
the declaration where it states that for chromosomal aberrations which result in aberrant 
expression of a mRNA and corresponding protein, "the gene product is a promising target for 
cancer therapy , for example, by the therapeutic antibody approach." Office Action at 8 
(emphasis added). The Examiner rejects this argument, stating that it was not persuasive because 
unlike the genes discussed in the references cited in the declaration, "[t]he PR01277 gene, ... has 
not been associated with tumor formation or the development of cancer, nor has it been shown to 
be predictive of such. Similarly, ... no translocation of PRO 1277 is known to occur. ... No 
mutation or translocation of PRO 1277 has been associated with esophagus or skin cancer." 
Office Action at 8-9. The Examiner concluded that "[i]n the absence of any of the above 
information" the disclosure was insufficient to satisfy the requirements of § 101. Id. at 9. 

The Examiner's arguments fail to establish that one of skill in the art would doubt 
Appellants' asserted utility. Once again, the Examiner has failed to establish how the "absence 
of any of the above information" is relevant to the asserted utility by supplying evidence or 
reasoning to support his assertion. See In re Brana, 51 F.3d 1560, 1566, 34 U.S.P.Q.2d 1436 
(Fed. Cir. 1995) ("Only after the PTO provides evidence showing that one of ordinary skill in the 
art would reasonably doubt the asserted utility does the burden shift to the Appellant to provide 
rebuttal evidence.") (emphasis added). 

The lack of a known role for PRO 1277 in tumor formation or the development of cancer 
does not prevent its use as a diagnostic tool for cancer. Likewise, the fact that there is no known 
translocation or mutation of PRO 1277 is irrelevant to whether its differential expression can be 
used to assist in diagnosis of cancer - one does not need to know why PRO 1277 is differentially 
expressed, or what the consequence of the differential expression is, in order to exploit the 
differential expression to distinguish tumor from normal tissue. 

The Revised Interim Utility Guidelines promulgated by the PTO recognize that proteins 
which are differentially expressed in cancer have utility. (See the caveat in Example 12 which 
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state that the utility requirement is satisfied where a protein is expressed in melanoma cells but 
not on normal skin and antibodies against the protein can be used to diagnose cancer.) In 
addition, while Appellants appreciate that actions taken in other applications are not binding on 
the PTO with respect to the present application, Appellants note that the PTO has issued several 
patents claiming differentially expressed polypeptides and antibodies to the same, or methods 
employing such antibodies. See, e.g., U.S. Patent No. 6,414,117, U.S. Patent No. 6,124,433, U.S. 
Patent No. 6,156,500, and U.S. Patent No. 6,562,343. 

In addition, Appellants note that they did not even rely on the portion of the second 
Grimaldi declaration cited by the Examiner which discusses targets for cancer therapy . Instead, 
Appellants submitted the second Grimaldi declaration in support of the assertion that changes in 
mRNA are positively correlated to changes in protein levels. Appellants relied on paragraph 5 of 
the declaration which states: "Those who work in this field are well aware that in the vast 
majority of cases, when a gene is over-expressed... the gene product or polypeptide will also be 
over-expressed.... This same principal applies to gene under-expression." Amendment and 
Response to Final Office Action at 21, quoting Second Grimaldi Declaration at | 5. As support 
for this statement, Mr. Grimaldi noted that "[techniques used to detect mRNA, such as Northern 
Blotting, Differential Display, in situ hybridization, quantitative PCR, Taqman, and more 
recently Microarray technology all rely on the dogma that a change in mRNA will represent a 
similar change in protein. If this dogma did not hold true then these techniques would have little 
value and not be so widely used." Second Grimaldi Declaration at U 5. Whether the differential 
expression of mRNA is due to mutations or translocations has no bearing on the portion of the 
Grimaldi reference relied on by Appellants. 

c. The Examiner's response to the Polakis Declaration 

In response to the Polakis Declaration, the Examiner makes two arguments. First, the 
Examiner states that only Dr. Polakis' conclusions are provided in the declaration. This is 
clearly not the case. 

Appellants rely on the following portion of the Polakis Declaration: 

Based on my own experience accumulated in more than 20 years of research, 
including the data discussed in paragraphs 4 and 5 above and my knowledge of 
the relevant scientific literature, it is my considered scientific opinion that for 
human genes, an increased level of mRNA in a tumor cell relative to a normal cell 
typically correlates to a similar increase in abundance of the encoded protein in 
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the tumor cell relative to the normal cell . In fact, it remains a central dogma in 
molecular biology that increased mRNA levels are predictive of corresponding 
increased levels of the encoded protein Polakis Declaration at \ 6 (emphasis 
added). 

Paragraphs 4 and 5 of the Polakis Declaration disclose that in the course of his research 

which is closely related to the instant invention, Dr. Polakis has identified approximately 200 

gene transcripts that are differentially expressed in human tumors. He has generated antibodies 

to about 30 of the protein products. In paragraph 5, he states that: 

[T]here is a strong correlation between changes in the level of mRNA present in 
any particular cell type and the level of protein expressed from that mRNA in that 
cell type. In approximately 80% of our observations we have found that increases 
in the level of a particular mRNA correlates with changes in the level of protein 
expressed from that mRNA when human tumor cells are compared with their 
corresponding normal cells. Polakis Declaration at K 5. 

Clearly, paragraphs 4 and 5 provide significant evidentiary support for is conclusions in 
paragraph 6. As to his statement that it is a central dogma of molecular biology that increases in 
mRNA lead to increases in protein, this statement is also supported by the data in paragraphs 4 
and 5, as well as Dr. Polakis' expertise and more than 20 years of research in the field. 
Appellants remind the Board that "[o]ffice personnel must accept an opinion from a qualified 
expert that is based upon relevant facts whose accuracy is not being questioned." PTO Utility 
Examination Guidelines (2001) (emphasis added). In addition, declarations relating to issues of 
fact should not be summarily dismissed as "opinions" without an adequate explanation of how 
the declaration fails to rebut the Examiner's position. In re Alton 76 F.3d 1 168 (Fed. Cir. 1996). 

The Examiner's second response to the Polakis declaration that there is a correlation 

between changes in the level of mRNA and changes in the level of the encoded protein is that: 

[I]t is important to note that the instant specification provides no information 
regarding differential mRNA levels of PRO 1277 in tumor samples as contrasted 
to normal tissue samples or the corresponding protein levels. Only gene 
amplification data were presented. Therefore the declaration is insufficient to 
overcome the [utility and enablement] rejection of claims 1-5... since it is limited 
to a discussion of data regarding the correlation of mRNA levels and polypeptide 
levels. Office Action at 13-14. 

As discussed above, this assertion is simply wrong. The data in Example 1 8 are differential 
mRNA data regarding the level of PR01277 mRNA in tumor samples compared to normal tissue 
samples. 
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Appellants also address the Examiner's statement that further research is required for the 
invention, and that this requirement "is borne out by Grimaldi assertion that 'additional studies 
can then be conducted if further information is desired.'" Office Action at 13 

The Examiner's reliance on the quote from the first Grimaldi declaration is clearly 
misplaced when read in context: 

7. The results of the gene expression studies indicate that the genes of 
interest can be used to differentiate tumor from normal. The precise levels of 
gene expression are irrelevant; what matters is that there is a relative difference in 
expression between normal tissue and tumor tissue. . . .If a difference is detected, 
this indicates that the gene and its corresponding polypeptide and antibodies 
against the polypeptide are useful for diagnostic purposes, to screen samples 
to differentiate between normal and tumor. Additional studies can then be 
conducted if further information is desired . First Grimaldi Declaration at % 7 
(emphasis added). 

It is obvious that Mr. Grimaldi was stating that it is his expert opinion that the information 
provided in Example 18 is sufficient to use the gene, protein and antibody as diagnostic tools, 
and that no further testing or information is required. However, if additional information is 
desired, such as the role of the gene or protein in cancer formation or growth, additional studies 
can be conducted. It is disingenuous of the Examiner to take this quote out of context to suggest 
that Mr. Grimaldi is stating that further research is required to use the claimed invention when 
the remainder of his declaration clearly states otherwise. 

Finally, the Examiner concludes his arguments by stating that "even if there were a 
correlation between mRNA levels and protein levels, Applicants have not established a nexus 
between the cDNA of instant invention and PR01277 protein.... Whether or not increased levels 
of PR01277 mRNA correlates with increased levels of PR01277 protein is not an issue." Office 
Action at 17. 

There is an obvious nexus between the PR01277 cDNA and the PR01277 polypeptide. 

First, as explained above, cDNA is made from mRNA, and thus the changes in PRO 1277 cDNA 

reported in Example 18 reflect changes in PRO 1277 mRNA. Second, as described in numerous 

references and declarations above, regulation of mRNA is the primary method for controlling the 

expression of a gene, and there is a general correlation between changes in mRNA levels and 

changes in protein levels. Because PR01277 cDNA and mRNA encodes the PR01277 

polypeptide, changes in PRO 1277 mRNA levels lead to changes in PRO 1277 polypeptide levels 

- the nexus between mRNA and the encoded protein is well-established. 
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In conclusion, none of the Examiner's responses to Appellants' supporting evidence are 
sufficient to rebut Appellants' asserted utility. 

10. The Courts have held that the Utility Requirement was S atisfied in Similar 
Cases 

The seminal decision interpreting the utility requirement of 35 U.S.C. § 101 is Brenner v. 
Manson, 383 U.S. 519, 148 U.S.P.Q. 689 (1966). At issue in Brenner was a claim to "a 
chemical process which yields an already known product whose utility - other than as a possible 
object of scientific inquiry - ha[d] not yet been evidenced." Id. at 529, 148 U.S.P.Q. at 693. The 
Patent Office rejected the claimed process for lack of utility because the product produced by the 
claimed process had no known use. See id. at 521-22, 148 U.S.P.Q. at 690. On appeal, the Court 
of Customs and Patent Appeals reversed, holding "where a claimed process produces a known 
product it is not necessary to show utility for the product." Id. at 522, 148 U.S.P.Q. at 691. 

In reviewing the lower court's decision, the Court made its oft quoted statement that 
"[t]he basic quid pro quo contemplated by the Constitution and the Congress for granting a 
patent monopoly is the benefit derived by the public from an invention with substantial utility. 
Unless and until a process is refined and developed to this point - where specific benefit exists in 
currently available form - there is insufficient justification for permitting an Appellant to engross 
what may prove to be a broad field." Id. at 534-35, 148 U.S.P.Q. at 695. 

The first opinion of the C.C.P.A. applying Brenner was In re Kirk, 376 F.2d 936, 153 
U.S.P.Q. 48 (C.C.P.A. 1967). The invention claimed in Kirk was a set of steroid derivatives said 
to have valuable biological properties and to be of value "in the furtherance of steroidal research 
and in the application of steroidal materials to veterinary or medical practice." Id. at 938, 153 
U.S.P.Q. at 50. In affirming the claim rejection based on a lack of utility, the court held that the 
"nebulous expressions 'biological activity' or 'biological properties'" did not adequately convey 
how to use the claimed compounds." Id. at 941, 153 U.S.P.Q. at 52. The court also rejected 
Appellants' supporting affidavit, stating, "the sum and substance of the affidavit appears to be 
that one of ordinary skill in the art would know 'how to use' the compounds to find out in the 
first instance whether the compounds are - or are not - in fact useful or possess useful properties, 
and to ascertain what those properties are." Id. at 942, 153 U.S.P.Q. at 53. 
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Since these early decisions, the courts have continued to clarify what is sufficient to 
satisfy the utility requirement. Three more recent decisions are of particular relevance to the 
instant application: Nelson v. Bowler, 626 F.2d 853, 206 U.S.P.Q. 881 (C.C.P.A. 1980), Cross v. 
Iizuka, 753 F.2d 1040, 224 U.S.P.Q. 739 (Fed. Cir. 1985), and Fujikawa v. Wattanasin, 93 F.3d 
1559, 39 U.S.P.Q. 2d 1895 (Fed. Cir. 1996). 

The earliest of these cases, Nelson v. Bowler, involved an interference between two 
applications related to derivatives of naturally occurring prostaglandins (PG). Nelson, 626 F.2d 
at 854-55. The issue was whether Nelson had shown at least one utility for the compounds at 
issue to establish an actual reduction to practice. Id. at 855. The Appellants relied on two tests 
to prove practical utility: an in vivo rat blood pressure (BP) test and an in vitro gerbil colon 
smooth muscle stimulation (GC-SMS) test. In the BP test, the blood pressure of anesthetized 
rats was recorded on a polygraph chart to determine whether an injected compound had any 
effect. Responses were categorized as either a depressor (lowering) effect or a pressor 
(elevating) effect. Id In the GC-SMS test a section of colon was excised from a freshly-killed 
gerbil for suspension in a physiological solution, and a lever arm was connected to the colon in 
such a way that any contraction was recorded as a polygraph trace. Id The Board held that 
Nelson had not shown adequate proof of practical utility, characterizing the tests as "rough 
screens, uncorrelated with actual utility." Id at 856. 

On appeal the C.C.P.A. reversed, holding that the Board "erred in not recognizing that 

tests evidencing pharmacological activity may manifest a practical utility even though they may 

not establish a specific therapeutic use." Id The Court stated that "practical utility" was 

characterized as a use of the claimed discovery in a manner which provides some immediate 

benefit to the public, establishing the following rule: 

Knowledge of the pharmacological activity of any compound is obviously 
beneficial to the public. It is inherently faster and easier to combat illnesses and 
alleviate symptoms when the medical profession is armed with an arsenal of 
chemicals having known pharmacological activities. Since it is crucial to provide 
researchers with an incentive to disclose pharmacological activities in as many 
compounds as possible, we conclude that adequate proof of any such activity 
constitutes a showing of practical utility. Id (emphasis added). 

The Court rejected Bowler's argument that the BP and GC-SMS tests are inconclusive 
showings of pharmacological activity since confirmation by statistically significant means did 
not occur until after the critical date. The Court stated that "a rigorous correlation is not 
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necessary where the test for pharmacological activity is reasonably indicative of the desired 
response." Id. (emphasis added). The Court concluded that a " reasonable correlation " between 
the observed properties and the suggested use was sufficient to establish practical utility. Id. at 
857. 

The sufficiency of a "reasonable correlation" in establishing utility was affirmed by the 
Court of Appeals for the Federal Circuit in Cross v. Iizuka, 753 F.2d 1040, 224 U.S.P.Q. 739 
(Fed. Cir. 1985). In Cross, the subject of the interference before the Court was imidazole 
derivative compounds which inhibit the synthesis of thromboxane synthetase, an enzyme which 
leads to the formation of thromboxane A 2 . At the time the applications were filed, 
thromboxane A 2 was postulated to be involved in platelet aggregation, which was associated 
with several deleterious conditions. Id. at 1042. 

The question before the Board and reviewed by the Court was whether Iizuka was 
entitled to the benefit of his Japanese priority application. Id The Japanese application 
disclosed that the imidazole derivatives showed strong inhibitory action for thromboxane 
synthetase from human or bovine platelet microsomes, an in vitro utility. Id at 1043. Relying in 
part on Nelson, the Board held that tests evidencing pharmacological activity may manifest a 
practical utility even though they may not establish a specific therapeutic use, and concluded that 
the in vitro tests were sufficient to establish a practical utility. Id 

On appeal, Cross argued that the basic in vitro tests conducted in cellular fractions did not 
establish a practical utility for the claimed compounds, and that more sophisticated in vitro or in 
vivo tests were necessary to establish a practical utility. Id at 1050. The Court rejected this 
argument, noting that adequate proof of any pharmaceutical activity constitutes a showing of 
practical utility. Id The Court accepted the argument that initial testing of compounds is widely 
done in vitro: 

[I\n vitro results... are generally predictive of in vivo test results, i.e., there is a 
reasonable correlation therebetween. Were this not so, the testing procedures of 
the pharmaceutical industry would not be as they are. Iizuka has not urged, and 
rightly so, that there is an invariable exact correlation between in vitro test results 
and in vivo test results. Rather, Iizuka' s position is that successful in vitro testing 
for a particular pharmacological activity establishes a significant probability that 
in vivo testing for this particular pharmacological activity will be successful. Id 
(emphasis added). 
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The Court also noted that in previous decisions, its predecessor court had accepted 

evidence of in vivo utility as sufficient to establish practical utility. The Court reasoned that: 

This in vivo testing is but an intermediate link in a screening chain which may 
eventually lead to the use of the drug as a therapeutic agent in humans. We 
perceive no insurmountable difficulty, under appropriate circumstances, in finding 
that the first link in the screening chain, in vitro testing, may establish a practical 
utility for the compound in question. Successful in vitro testing will marshal 
resources and direct the expenditure of effort to further in vivo testing of the most 
potent compounds, thereby providing an immediate benefit to the public, 
analogous to the benefit provided by the showing of an in vivo utility . Id. at 1051, 
citing Nelson, 626 F.2d at 856 (emphasis added). 

Based on this reasoning, the Court affirmed the decision of the Board, stating that "based 
upon the relevant evidence as a whole, there is a reasonable correlation between the disclosed in 
vitro utility and an in vivo activity, and therefore a rigorous correlation is not necessary where 
the disclosure of pharmacological activity is reasonable based upon the probative evidence." Id. 
at 1050 (emphasis added). The Court therefore held that the disclosed in vitro utility was 
"sufficient to comply with the practical utility requirement of § 101." Id. at 1051. 

The holdings of Nelson and Cross were more recently affirmed in Fujikawa v. 

Wattanasin, 93 F.3d 1559, 39 U.S.P.Q.2d 1895 (Fed. Cir. 1996). In Fujikawa, the Court again 

affirmed the notion that initial screens of compounds provide a practical utility even though they 

may not provide a therapeutic use because " 4 [i]t is inherently faster and easier to combat 

illnesses and alleviate symptoms when the medical profession is armed with an arsenal of 

chemicals having known pharmacological activities.'" Id. at 1564, quoting Nelson, 626 F.2d at 

856. The Court noted that it may be difficult to predict whether novel compounds will exhibit 

pharmacological activity, and consequently testing is often required to establish practical utility. 

Id However the Court went on to state: 

But the test results need not absolutely prove that the compound is 
pharmacologically active. All that is required is that the tests be "reasonably 
indicative of the desired [pharmacological] response." In other words, there must 
be a sufficient correlation between the tests and an asserted pharmacological 
activity so as to convince those skilled in the art, to a reasonable probability , that 
the novel compound will exhibit the asserted pharmacological behavior." Id. 
(internal citations omitted, underline emphasis added, italics in original). 

On appeal, Fujikawa argued that Wattanasin had failed to establish an adequate 
correlation between the in vitro and in vivo results to permit Wattanasin to rely on positive in 
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vitro results to establish a practical utility. The Court stated that the Board relied on testimony 
from those skilled in the art that the in vitro results convinced the experts that the claimed 
compounds would exhibit the desired pharmacological activity when administered in vivo, 
including testimony that in vivo activity is typically highly correctable to a compound's in vitro 
activity in the field. Id. at 1565. To overcome this evidence and counter the Board's decision, 
Fujikawa pointed to the testimony of its expert that "there is a reasonable element of doubt that 
some elements may be encountered which are active in the in vitro assay, but yet inactive in the 
in vivo assay." Id. 

The Court rejected this argument: "Of course, it is possible that some compounds active 
in vitro may not be active in vivo. But, as our predecessor court in Nelson explained, a 'rigorous 
correlation' need not be shown in order to establish practical utility; 'reasonable correlation' 
suffices ." Id. (emphasis added). The Court also rejected Fujikawa's reliance on two articles. 
The Court noted that while one article taught that "in vitro testing is sometimes not a good 
indicator of how potent a compound will be in vivo, it does imply that compounds which are 
active in vitro will normally exhibit some in vivo activity." Id. at 1566. Similarly, the Court 
noted that the second article expressly stated that "[f]or most substances, although not for all, the 
relative potency determined in in vitro . . . parallels the in vivo activity." Id. 

The Court concluded that the facts in the case were analogous to the ones in Cross where 
the court relied on a known reasonable correlation between in vitro tests and in vivo activity, and 
therefore affirmed the Board's decision that Wattanasin had established a practical utility with 
the in vitro results. Id at 1565-66. 

The Nelson, Cross, and Fujikawa cases are very similar to the present case. The 
reasoning of the courts in all three cases that "'[i]t is inherently faster and easier to combat 
illnesses and alleviate symptoms when the medical profession is armed with an arsenal of 
chemicals having known pharmacological activities'" applies to the asserted utility for the 
claimed antibodies. Fujikawa, 93 F.3d at 1564, quoting Nelson, 626 F.2d at 856; see also Cross, 
753 F.2d at 1051 ("Successful in vitro testing will marshal resources and direct the expenditure 
of effort to further in vivo testing of the most potent compounds, thereby providing an immediate 
benefit to the public, analogous to the benefit provided by the showing of an in vivo utility."). 
Like pharmaceutical compounds, nucleic acids, polypeptides, and antibodies which are 
associated with cancer will make it inherently faster and easier to combat cancer. The greater the 
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number of biological markers of cancer medical professionals have access to, the more accurate 
and detailed a diagnosis they can make. The determination that a gene is differentially expressed 
in cancer constitutes at least as significant a development in the field of cancer diagnostics as in 
vitro screening for pharmaceutical activity. See Cross, 753 F.2d at 1051 ("the first link in the 
screening chain, in vitro testing, may establish a practical utility for the compound in question. 
Successful in vitro testing will marshal resources and direct the expenditure of effort to further in 
vivo testing of the most potent compounds, thereby providing an immediate benefit to the 
public"). 

In addition, like in vitro tests in the pharmaceutical industry, those of skill in the field of 
biotechnology rely on the reasonable correlation that exists between gene expression and protein 
expression (see discussion supra). Were there no reasonable correlation between the two, the 
techniques that measure gene levels such as microarray analysis, differential display, and 
quantitative PCR would not be so widely used by those in the art. See Second Grimaldi 
Declaration at If 5. As in Cross, Appellants here do not argue that there is "an invariable exact 
correlation" between gene expression and protein expression. See Cross, 753 F.2d at 1050. 
Instead, Appellants' position detailed above is that a measured change in gene expression in 
cancer cells establishes a "significant probability" that the expression of the encoded polypeptide 
in cancer will also be changed based on "a reasonable correlation therebetween." Id ; see also 
Fujikawa, 93 F.3d at 1565 ("a 'rigorous correlation' need not be shown in order to establish 
practical utility; 'reasonable correlation' suffices"); Nelson, 626 F.2d at 857 (holding that "a 
rigorous correlation is not necessary" and that a "reasonable correlation" will suffice). 

Also of importance is the Court's rejection of the notion that any in vitro testing must be 
statistically significant to support a practical utility. Nelson, 626 F.2d at 857. Likewise, 
qualitative characterizations of a test compound as either increasing or decreasing blood pressure 
was acceptable. Id at 855 (stating that responses were categorized as either a depressor 
(lowering) effect or a pressor (elevating) effect). This is similar to the data in Example 18, 
where the change in mRNA levels is described as "more highly expressed." 

There are additional similarities. In Fujikawa, the Board and Court rejected the argument 
that there was no utility because there was no exact correlation between the in vitro and in vivo 
results in spite of supporting testimony and references. Fujikawa, 93 F.3d at 1565-66. Like the 
two references rejected by the Board and Court in Fujikawa, the Chen et al reference cited by 
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the Examiner may suggest that the correlation between changes in mRNA levels and protein 
levels is not exact. But like Fujikawa, portions of Chen et al also support Appellants' assertion, 
and Appellants have submitted the declaration of two experts in the field which state that those in 
the field rely on the correlation between changes in mRNA and protein. See Second Grimaldi 
Declaration at If 5; Polakis Declaration at f 6. Thus, as was the case in Fujikawa, although there 
may be some evidence that the correlation relied on is not exact, the declarations and numerous 
references submitted by Appellants is more than enough evidence to establish that there is a 
"reasonable correlation" between changes in mRNA levels and changes in the level of the 
encoded protein. 

In conclusion, Appellants have asserted that the claimed antibodies are useful for the 
diagnosis of cancer, particularly esophageal and skin cancer based on the data in Example 1 8. 
This utility is far beyond the nebulous expressions "biological activity" or "biological properties" 
rejected in In re Kirk, 376 F.2d 936, 153 U.S.P.Q. 48 (C.C.P.A. 1967). Like Nelson, Cross, and 
Fujikawa, Appellants have asserted a utility which relies on a reasonable correlation between the 
data disclosed in the application and the asserted utility. The fact that there may be limited 
evidence that the correlation is not exact does not invalidate Appellants' showing of utility since 
the correlation need not be a rigorous or exact one. Considering the relevant evidence as a whole, 
Appellants have provided sufficient evidence to establish a reasonable correlation between 
changes in the level of mRNA and corresponding changes in the level of the encoded 
polypeptide. Therefore the claimed antibodies have a practical utility as diagnostic tools for 
esophageal and skin cancer. 

11. Utility - Conclusion 

Appellants' asserted utility for the claimed antibodies as diagnostic tools for cancer 
corresponds in scope to the subject matter sought to be patented and therefore "must be taken as 
sufficient to satisfy the utility requirement of § 101 for the entire claimed subject." In re hanger, 
503 F.2d 1380, 1391, 183 U.S.P.Q. 288, 297 (C.C.P.A. 1974). The Examiner's unsupported 
arguments and references are not sufficient evidence to make a prima facie showing that "one of 
ordinary skill in the art would reasonably doubt the asserted utility." In re Brana, 51 F.3d 1560, 
1566, 34 U.S.P.Q.2d 1436 (Fed. Cir. 1995). 
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And even if the Examiner has established a prima facie case, Appellants have offered 
sufficient rebuttal evidence in the form of expert declarations and references, which, when 
considered as a whole, establish that it is more likely than not that the asserted utility is true. See 
In re Oetiker, 977 F.2d 1443, 1445, 24 U.S.P.Q.2d 1443, 1444 (Fed. Cir. 1992) (stating that the 
evidentiary standard to be used throughout ex parte examination in setting forth a rejection is a 
preponderance of the evidence, or "more likely than not" standard); M.P.E.P. at § 2107.02, part 
VII ("evidence will be sufficient if, considered as a whole, it leads a person of ordinary skill in 
the art to conclude that the asserted utility is more likely than not true .") (emphasis in original). 

Finally, the courts' decisions in similar cases make clear that the evidence provided by 
Appellants is sufficient to establish the asserted utility. The evidence does not need to be direct 
evidence, nor does it need to provide an exact correlation between the submitted evidence and 
the asserted utility. Instead, evidence which is "reasonably" correlated with the asserted utility is 
sufficient. See Fujikawa, 93 F.3d at 1565 ("a 'rigorous correlation' need not be shown in order 
to establish practical utility; 'reasonable correlation' suffices"); Cross, 753 F.2d at 1050 (same); 
Nelson, 626 F.2d at 857 (same). Considering the evidence as a whole in light of the relevant 
cases, the Board should find that Appellants have established at least one specific, substantial, 
and credible utility, and the Examiner's rejection of the pending claims as lacking utility should 
be reversed. 

C. Enablement Rejection - Detailed Argument 

The second issue before the Board is whether Appellants have enabled the pending 
claims such that one of skill in the art would be able to make and use the claimed invention. The 
Examiner has rejected Claims 1-5 under 35 U.S.C. §112, first paragraph, arguing that because 
the claimed invention is not supported by either a specific or substantial asserted utility or a well- 
established utility, one skilled in the art clearly would not know how to use the claimed invention. 
See First Office Action at 8; Final Office Action at 3. 

1. Because the Claimed Invention is Supported by a Specific. Substantial and 

Credible Utility* the Enablement Rejection should be Reversed 
For the reasons stated above, the claimed invention is supported by a specific, substantial 
and credible utility. Because the lack of a supporting utility is the only basis for the Examiner's 
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rejection under 35 U.S.C. §112, first paragraph, the Board should reverse the rejection of Claims 
1-5 as lacking enablement. 

D. Conclusion 

In view of the arguments presented above, Appellants submit that the specification as 
filed provides a specific, substantial and credible utility for the claimed antibodies and request 
withdrawal of the rejection under 35 U.S.C. §101, and the related rejection under 35 U.S.C. §112. 

Please charge any additional fees, including any fees for additional extension of time, or 
credit overpayment to Deposit Account No. 11-1410. 



Respectfully submitted, 



KNOBBE, MARTENS, OLSON & BEAR, LLP 




Customer No. 30,313 
(619) 235-8550 



1928475 
091305 



-43- 



Appl. No. : 10/063,540 

Filed : May 2, 2002 



VIII. APPENDIX A - CLAIMS ON APPEAL 

1 . An isolated antibody that specifically binds to the polypeptide of SEQ ID NO:34. 

2. The antibody of Claim 1 which is a monoclonal antibody. 

3. The antibody of Claim 1 which is a humanized antibody. 

4. The antibody of Claim 1 which is an antibody fragment. 

5. The antibody of Claim 1 which is labeled. 
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IX. APPENDIX B - EVIDENCE 

Attached hereto is a copy of the evidence cited in Appellants' Brief. The list of evidence 
below is accompanied by a statement setting forth where in the record that evidence was entered 
into the record by the Examiner. 



Tab 


Reference 


Submitted 


Entered 


1 


Pennica etal. (Proc. 
Natl. Acad. Sci. USA 
(1998) 95:14717- 
14722) 




Cited by Examiner in the first 
Office Action 


2 


Sen et al. (Curr. Opin. 
Oncol. (2000) 12: 82- 
88) 




Cited by Examiner in the first 
Office Action 


3 


Hu et al (J. Proteome 
Res., (2003) 2(4):405- 
12) 




Cited by Examiner in the final 
Office Action 


4 


First Declaration of J. 
Christopher Grimaldi 


Originally Submitted with 
Appellants 5 Amendment and 
Response to Office Action as 
Exhibit 2 


Entered by Examiner in final 
Office Action 


5 


Haynes et al 
(Electrophoresis, 
(1998) 19(11):1862-71) 




Cited by Examiner in the final 
Office Action 


6 


Chen pt al fMol and 
Cell Proteomics, 
(2002) 1:304-313) 




Cited by Examiner in the final 
Office Action 


7 


Gygi et al, Molecular 
and Cellular Biology, 
Mar. 1999, 1720-1730 


Originally Submitted with 
Appellants' Amendment and 
Response to Final Office 
Action as Exhibit 5 


Entered by Examiner in 
Advisory Action as confirmed 
by Examiner in telephone 
conversation 


8 


Second Declaration by 
J. Christopher Grimaldi 


Originally submitted with 
Appellants' Amendment and 
Response to Office Action as 
Exhibit 6 


Entered by Examiner in final 
Office Action 


9 


Declaration of Paul 
Polakis, Ph.D. 


Originally submitted with 
Appellants' Amendment and 
Response to Office Action as 
Exhibit 7 


Entered by Examiner in final 
Office Action 


10 


Bruce Alberts, et al, 
Molecular Biology of 
the Cell (3 rd ed. 1994) 
hereinafter "Cell 3 rd ") 


Originally submitted with 
Appellants' Amendment and 
Response to Final Office 
Action as Exhibit 6 


Entered by Examiner in 
Advisory Action as confirmed 
by Examiner in telephone 
conversation 
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11 


Bruce Alberts, et ai 9 
Molecular Biology of 
the Cell (4 th ed. 2002) 


Originally submitted with 
Appellants' Amendment and 
Response to Final Office 
Action as Exhibit 7 


Entered by Examiner in 
Advisory Action as confirmed 
by Examiner in telephone 
conversation 


12 


Benjamin Lewin, 
Genes VI (1997) 


Originally submitted with 
Appellants' Amendment and 
Response to Final Office 
Action as Exhibit 8 


Entered by Examiner in 
Advisory Action as confirmed 
by Examiner in telephone 
conversation 


13 


Zhigang et al , World 
Journal of Surgical 
Oncology 2: 13, 2004 


Originally submitted with 
Appellants' Amendment and 
Response to Final Office 
Action as Exhibit 9 


Entered by Examiner in 

A J * A a" - Ci -- - -1 

Advisory Action as confirmed 
by Examiner in telephone 
conversation 


14 


Meric et al 9 Molecular 
Cancer Therapeutics, 
vol 1,971-979 (2002) 


Originally submitted with 
Appellants' Amendment and 
Response to Final Office 
Action as Exhibit 10 


Entered by Examiner in 
Advisory Action as confirmed 
by Examiner in telephone 
conversation 


15 


Page 122 of the 2002- 
2003 New England 
Biolabs catalog. 


Originally submitted with 
Appellants' Amendment and 
Response to Final Office 
Action as Exhibit 1 


Entered by Examiner in 
Advisory Action as confirmed 
by Examiner in telephone 
conversation 
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There are no decisions rendered by a court or the Board in any related proceedings 
identified above. 
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WISP genes are members of the connective tissue growth factor 
family that are up-regulated in Wnt-l-transformed cells and 
aberrantly expressed in human colon tumors 
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Contributed by David Botstein and Arnold I. Levine, October 21, 1998 

ABSTRACT Wnt family members are critical to many 
developmental processes, and components of the Wnt signal- 
ing pathway have been linked to tumorigenesis in familial and 
sporadic colon carcinomas. Here we report the identification 
of two genes, WISP-1 and WISP-2, that are up-regulated in the 
mouse mammary epithelial cell line C57MG transformed by 
Wnt-1, but not by Wnt-4. Together with a third related gene, 
WISPS, these proteins define a subfamily of the connective 
tissue growth factor family. Two distinct systems demon- 
strated WISP induction to be associated with the expression of 
Wnt-1. These included (i) C57MG cells infected with a Wnt-1 
retroviral vector or expressing Wnt-1 under the control of a 
tetracyline repressible promoter, and (ii) Wnt-1 transgenic 
mice. The WISP-1 gene was localized to human chromosome 
8q24.1-8q24.3. WISP-1 genomic DNA was amplified in colon 
cancer cell lines and in human colon tumors and its RNA 
overexpressed (2- to >30-fold) in 84% of the tumors examined 
compared with patient-matched normal mucosa. WISP-3 
mapped to chromosome 6q22-6q23 and also was overex- 
pressed (4- to >40-foid) in 63% of the colon tumors analyzed. 
In contrast, WISP-2 mapped to human chromosome 20ql2- 
20ql3 and its DNA was amplified, but RNA expression was 
reduced (2- to > 30-fold) in 79% of the tumors. These results 
suggest that the WISP genes may be downstream of Wnt-1 
signaling and that aberrant levels of WISP expression in colon 
cancer may play a role in colon tumorigenesis. 



Wnt-1 is a member of an expanding family of cysteine-rich, 
glycosylated signaling proteins that mediate diverse develop- 
mental processes such as the control of cell proliferation, 
adhesion, cell polarity, and the establishment of cell fates (1, 
2). Wnt-1 originally was identified as an oncogene activated by 
the insertion of mouse mammary tumor virus in virus-induced 
mammary adenocarcinomas (3, 4). Although Wnt-1 is not 
expressed in the normal mammary gland, expression of Wnt-1 
in transgenic mice causes mammary tumors (5). 

In mammalian cells, Wnt family members initiate signaling 
by binding to the seven-transmembrane spanning Frizzled 
receptors and recruiting the cytoplasmic protein Dishevelled 
(Dsh) to the cell membrane (1, 2, 6). Dsh then inhibits the 
kinase activity of the normally constitutively active glycogen 
synthase kinase-3/3 (GSK-3/3) resulting in an increase in 
/3-catenin levels. Stabilized /3-catenin interacts with the tran- 
scription factor TCF/Lef 1, forming a complex that appears in 
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the nucleus and binds TCF/Lef 1 target DNA elements to 
activate transcription (7, 8). Other experiments suggest that 
the adenomatous polyposis coli (APC) tumor suppressor gene 
also plays an important role in Wnt signaling by regulating 
/3-catenin levels (9). APC is phosphorylated by GSK-3/3, binds 
to /3-catenin, and facilitates its degradation. Mutations in 
either APC or /3-catenin have been associated with colon 
carcinomas and melanomas, suggesting these mutations con- 
tribute to the development of these types of cancer, implicating 
the Wnt pathway in tumorigenesis (1). 

Although much has been learned about the Wnt signaling 
pathway over the past several years, only a few of the tran- 
scriptionally activated downstream components activated by 
Wnt have been characterized. Those that have been described 
cannot account for all of the diverse functions attributed to 
Wnt signaling. Among the candidate Wnt target genes are 
those encoding the nodal-related 3 gene, Xnr3, a member of 
the transforming growth factor (TGF)-)3 superfamiry, and the 
homeobox genes, engrailed, goosecoid, twin (Xtwn), and siamois 
(2). A recent report also identifies c-myc as a target gene of the 
Wnt signaling pathway (10). 

To identify additional downstream genes in the Wnt signal- 
ing pathway that are relevant to the transformed cell pheno- 
type, we used a PCR-based cDNA subtraction strategy, sup- 
pression subtractive hybridization (SSH) (11), using RNA 
isolated from C57MG mouse mammary epithelial cells and 
C57MG cells stably transformed by a Wnt-1 retrovirus. Over- 
expression of Wnt-1 in this cell line is sufficient to induce a 
partially transformed phenotype, characterized by elongated 
and refractile cells that lose contact inhibition and form a 
multilayered array (12, 13). We reasoned that genes differen- 
tially expressed between these two cell lines might contribute 
to the transformed phenotype. 

In this paper, we describe the cloning and characterization 
of two genes up-regulated in Wnt-1 transformed cells, WISP-1 
and WISP-2, and a third related gene, WISP-3. The WISP genes 
are members of the CCN family of growth factors, which 
includes connective tissue growth factor (CTGF), Cyr61, and 
nov, a family not previously linked to Wnt signaling. 

MATERIALS AND METHODS 

SSH. SSH was performed by using the PCR-Select cDNA 
Subtraction Kit (CLONTECH). Tester double-stranded 

Abbreviations: TGF, transforming growth factor; CTGF, connective 
tissue growth factor; SSH, suppression subtractive hybridization; 
VWC, von Willebrand factor type C module. 
Data deposition: The sequences reported in this paper have been 
deposited in the Genbank database (accession nos. AF100777, 
AF100778, AF100779, AF100780, and AF100781). 
tTo whom reprint requests should be addressed, e-mail: diane@gene. 
com. 
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cDNA was synthesized from 2 of poly(A) + RNA isolated 
from the C57MG/Wnt-1 cell line and driver cDNA from 2 /xg 
of poly(A)+ RNA from the parent C57MG cells. The sub- 
tracted cDNA library was subcloned into a pGEM-T vector for 
further analysis. 

cDNA Library Screening. Clones encoding full-length 
mouse WISP-1 were isolated by screening a AgtlO mouse 
embryo cDNA library (CLONTECH) with a 70-bp probe from 
the original partial clone 568 sequence corresponding to amino 
acids 128-169. Clones encoding full-length human WISP-1 
were isolated by screening AgtlO lung and fetal kidney cDNA 
libraries with the same probe at low stringency. Clones en- 
coding full-length mouse and human WISP-2 were isolated by 
screening a C57MG/Wnt-1 or human fetal lung cDNA library 
with a probe corresponding to nucleotides 1463-1512. Full- 
length cDNAs encoding WISPS were cloned from human 
bone marrow and fetal kidney libraries. 

Expression of Human WISP RNA. PCR amplification of 
first-strand cDNA was performed with human Multiple Tissue 
cDNA panels (CLONTECH) and 300 jaM of each dNTP at 
94°C for 1 sec, 62°C for 30 sec, 72°C for 1 min, for 22-32 cycles. 
WISP and glyceraldehyde-3-phosphate dehydrogenase primer 
sequences are available on request. 

In Situ Hybridization. 33 P-labeled sense and antisense ribo- 
probes were transcribed from an 897-bp PCR product corre- 
sponding to nucleotides 601-1440 of mouse WISP-1 or a 
294-bp PCR product corresponding to nucleotides 82-375 of 
mouse WISP-2. All tissues were processed as described (40). 

Radiation Hybrid Mapping. Genomic DNA from each 
hybrid in the Stanford G3 and Genebridge4 Radiation Hybrid 
Panels (Research Genetics, Huntsville, AL) and human and 
hamster control DNAs were PCR-amplified, and the results 
were submitted to the Stanford or Massachusetts Institute of 
Technology web servers. 

Cell Lines, Tumors, and Mucosa Specimens. Tissue speci- 
mens were obtained from the Department of Pathology (Uni- 
versity of Pittsburgh) for patients undergoing colon resection 
and from the University of Leeds, United Kingdom. Genomic 
DNA was isolated (Qiagen) from the pooled blood of 10 
normal human donors, surgical specimens, and the following 
ATCC human cell lines: SW480, COLO 320DM, HT-29, 
WiDr, and SW403 (colon adenocarcinomas), SW620 (lymph 
node metastasis, colon adenocarcinoma), HCT 116 (colon 
carcinoma), SK-CO-1 (colon adenocarcinoma, ascites), and 
HM7 (a variant of ATCC colon adenocarcinoma cell line LS 
174T). DNA concentration was determined by using Hoechst 
dye 33258 intercalation f luorimetry. Total RNA was prepared 
by homogenization in 7 M GuSCN followed by centrifugation 
over CsCl cushions or prepared by using RNAzol. 

Gene Amplification and RNA Expression Analysis. Relative 
gene amplification and RNA expression of WISPs and c-myc in 
the cell lines, colorectal tumors, and normal mucosa were 
determined by quantitative PCR. Gene-specific primers and 
fluorogenic probes (sequences available on request) were 
designed and used to amplify and quantitate the genes. The 
relative gene copy number was derived by using the formula 
2 (Act) where ACt represents the difference in amplification 
cycles required to detect the WISP genes in peripheral blood 
lymphocyte DNA compared with colon tumor DNA or colon 
tumor RNA compared with normal mucosal RNA. The 
d-method was used for calculation of the SE of the gene copy 
number or RNA expression level. The H7£P-specific signal was 
normalized to that of the glyceraldehyde-3-phosphate dehy- 
drogenase housekeeping gene. All TaqMan assay reagents 
were obtained from Perkin-Elmer Applied Biosystems. 

RESULTS 

Isolation of WISP-1 and WISP-2 by SSH. To identify Wnt- 
1-inducible genes, we used the technique of SSH using the 
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mouse mammary epithelial cell line C57MG and C57MG cells 
that stably express Wnt-1 (11). Candidate differentially ex- 
pressed cDNAs (1,384 total) were sequenced. Thirty-nine 
percent of the sequences matched known genes or homo- 
logues, 32% matched expressed sequence tags, and 29% had 
no match. To confirm that the transcript was differentially 
expressed, semiquantitative reverse transcription-PCR and 
Northern analysis were performed by using mRNA from the 
C57MG and C57MG/Wnt-1 cells. 

Two of the cDNAs, WISP-1 and WISP-2, were differentially 
expressed, being induced in the C57MG/Wnt-1 cell line, but 
not in the parent C57MG cells or C57MG cells overexpressing 
Wnt-4 (Fig. 1A and B). Wnt-4, unlike Wnt-1, does not induce 
the morphological transformation of C57MG cells and has no 
effect on /3-catenin levels (13, 14). Expression of WISP-1 was 
up-regulated approximately 3-fold in the C57MG/Wnt-1 cell 
line and WISP-2 by approximately 5-fold by both Northern 
analysis and reverse transcription-PCR. 

An independent, but similar, system was used to examine 
WISP expression after Wnt-1 induction. C57MG cells express- 
ing the Wnt-1 gene under the control of a tetracycline- 
repressible promoter produce low amounts of Wnt-1 in the 
repressed state but show a strong induction of Wnt-1 mRNA 
and protein within 24 hr after tetracycline removal (8). The 
levels of Wnt-1 and WISP RNA isolated from these cells at 
various times after tetracycline removal were assessed by 
quantitative PCR. Strong induction of Wnt-1 mRNA was seen 
as early as 10 hr after tetracycline removal. Induction of WISP 
mRNA (2- to 6-fold) was seen at 48 and 72 hr (data not shown). 
These data support our previous observations that show that 
WISP induction is correlated with Wnt-1 expression. Because 
the induction is slow, occurring after approximately 48 hr, the 
induction of WISPs may be an indirect response to Wnt-1 
signaling. 

cDNA clones of human WISP-1 were isolated and the 
sequence compared with mouse WISP-1. The cDNA sequences 
of mouse and human WISP-1 were 1,766 and 2,830 bp in length, 
respectively, and encode proteins of 367 aa, with predicted 
relative molecular masses of ^ 40,000 (M T 40 K). Both have 
hydrophobic N-terminal signal sequences, 38 conserved cys- 
teine residues, and four potential N-linked glycosylation sites 
and are 84% identical (Fig. 24). 

Full-length cDNA clones of mouse and human WISP-2 were 
1,734 and 1,293 bp in length, respectively, and encode proteins 
of 251 and 250 aa, respectively, with predicted relative molec- 
ular masses of ~27,000 (M r 27 K) (Fig. IB). Mouse and human 
WISP-2 are 73% identical. Human WISP-2 has no potential 
N-linked glycosylation sites, and mouse WISP-2 has one at 

C57MG 



Parent WnM Wnt-4 ' 




Fig. 1. WISP-1 and WISP-2 are induced by Wnt-1, but not Wnt-4, 
expression in C57MG cells. Northern analysis of WISP-1 (A) and 
WISP-2 (B) expression in C57MG, C57MG/Wnt-1, and C57MG/ 
Wnt-4 cells. Poly(A) + RNA (2 jig) was subjected to Northern blot 
analysis and hybridized with a 70-bp mouse WISP-1 -specie probe 
(amino acids 278-300) or a 190-bp R75P-2-specific probe (nucleotides 
1438-1627) in the 3' untranslated region. Blots were rehybridized with 
human 0-actin probe. 
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Fig. 2. Encoded amino acid sequence alignment of mouse and 
human WISP-1 (A) and mouse and human WISP-2 (B). The potential 
signal sequence, insulin-like growth factor-binding protein (IGF-BP), 
VWC, thrombospondin (TSP), and C-terminal (CT) domains are 
underlined. 

position 197. WISP-2 has 28 cysteine residues that are con- 
served among the 38 cysteines found in WISP-1. 

Identification of WISPS. To search for related proteins, we 
screened expressed sequence tag (EST) databases with the 
WISP-1 protein sequence and identified several ESTs as 
potentially related sequences. We identified a homologous 
protein that we have called WISP-3. A full-length human 
WISPS cDNA of 1,371 bp was isolated corresponding to those 
ESTs that encode a 354-aa protein with a predicted molecular 
mass of 39,293. WISP-3 has two potential N-linked glycosyl- 
ation sites and 36 cysteine residues. An alignment of the three 
human WISP proteins shows that WISP-1 and WISP-3 are the 
most similar (42% identity), whereas WISP-2 has 37% identity 
with WISP-1 and 32% identity with WISP-3 (Fig. 3/1). 

WISPs Are Homologous to the CTGF Family of Proteins. 
Human WISP-1, WISP-2, and WISPS are novel sequences; 
however, mouse WISP-1 is the same as the recently identified 
Elml gene. Elml is expressed in low, but not high, metastatic 
mouse melanoma cells, and suppresses the in vivo growth and 
metastatic potential of K-1735 mouse melanoma cells (15). 
Human and mouse WISP-2 are homologous to the recently 
described rat gene, rCop-1 (16). Significant homology (36- 
44%) was seen to the CCN family of growth factors. This family 
includes three members, CTGF, Cyr61, and the protoonco- 
gene nov. CTGF is a chemotactic and mitogen ic factor for 
fibroblasts that is implicated in wound healing and fibrotic 
disorders and is induced by TGF-j3 (17). Cyr61 is an extracel- 
lular matrix signaling molecule that promotes cell adhesion, 
proliferation, migration, angiogenesis, and tumor growth (18, 
19). nov (nephroblastoma overexpressed) is an immediate 
early gene associated with quiescence and found altered in 
Wilms tumors (20). The proteins of the CCN family share 
functional, but not sequence, similarity to Wnt-1. All are 
secreted, cysteine-rich heparin binding glycoproteins that as- 
sociate with the cell surface and extracellular matrix. 

WISP proteins exhibit the modular architecture of the CCN 
family, characterized by four conserved cysteine-rich domains 
(Fig. 3B) (21). The N-terminal domain, which includes the first 
12 cysteine residues, contains a consensus sequence (GCGC- 
CXXC) conserved in most insulin-like growth factor (IGF)- 
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Fig. 3. (^4) Encoded amino acid sequence alignment of human 
WISPs. The cysteine residues of WISP-1 and WISP-2 that are not 
present in WISP-3 are indicated with a dot. (B) Schematic represen- 
tation of the WISP proteins showing the domain structure and cysteine 
residues (vertical lines). The four cysteine residues in the VWC domain 
that are absent in WISP-3 are indicated with a dot. (C) Expression of 
WISP mRNA in human tissues. PCR was performed on human 
multiple-tissue cDNA panels (CLONTECH) from the indicated adult 
and fetal tissues. 

binding proteins (BP). This sequence is conserved in WISP-2 
and WISP-3, whereas WISP-1 has a glutamine in the third 
position instead of a glycine. CTGF recently has been shown 
to specifically bind IGF (22) and a truncated nov protein 
lacking the IGF-BP domain is oncogenic (23). The von Wil- 
lebrand factor type C module (VWC), also found in certain 
collagens and mucins, covers the next 10 cysteine residues, and 
is thought to participate in protein complex formation and 
oligomerization (24). The VWC domain of WISP-3 differs 
from all CCN family members described previously, in that it 
contains only six of the 10 cysteine residues (Fig. 3 A and B). 
A short variable region follows the VWC domain. The third 
module, the thrombospondin (TSP) domain is involved in 
binding to sulfated glycoconjugates and contains six cysteine 
residues and a conserved WSxCSxxCG motif first identified in 
thrombospondin (25). The C-terminal (CT) module contain- 
ing the remaining 10 cysteines is thought to be involved in 
dimerization and receptor binding (26). The CT domain is 
present in all CCN family members described to date but is 
absent in WISP-2 (Fig. 3 A and B). The existence of a putative 
signal sequence and the absence of a transmembrane domain 
suggest that WISPs are secreted proteins, an observation 
supported by an analysis of their expression and secretion from 
mammalian cell and baculovirus cultures (data not shown). 

Expression of WISP mRNA in Human Tissues. Tissue- 
specific expression of human WISPs was characterized by PCR 
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analysis on adult and fetal multiple tissue cDNA panels. 
WISP-1 expression was seen in the adult heart, kidney, lung, 
pancreas, placenta, ovary, small intestine, and spleen (Fig. 3C). 
Little or no expression was detected in the brain, liver, skeletal 
muscle, colon, peripheral blood leukocytes, prostate, testis, or 
thymus. WISP-2 had a more restricted tissue expression and 
was detected in adult skeletal muscle, colon, ovary, and fetal 
lung. Predominant expression of WISPS was seen in adult 
kidney and testis and fetal kidney. Lower levels of WISP-3 
expression were detected in placenta, ovary, prostate, and 
small intestine. 

In Situ Localization of WISP-1 and WISP-2. Expression of 
WISP-1 and WISP-2 was assessed by in situ hybridization in 
mammary tumors from Wnt-1 transgenic mice. Strong expres- 
sion of WISP-1 was observed in stromal fibroblasts lying within 
the fibrovascular tumor stroma (Fig. 4 A-D). However, low- 
level WISP-1 expression also was observed focally within tumor 
cells (data not shown). No expression was observed in normal 
breast. Like WISP-1, WISP-2 expression also was seen in the 
tumor stroma in breast tumors from Wnt-1 transgenic animals 
(Fig. 4 E-H). However, WISP-2 expression in the stroma was 
in spindle-shaped cells adjacent to capillary vessels, whereas 






Fig. 4. (A, C, E, and G) Representative hematoxylin/eosin-stained 
images from breast tumors in Wnt-1 transgenic mice. The correspond- 
ing dark-field images showing WISP-1 expression are shown in B and 
D. The tumor is a moderately well-differentiated adenocarcinoma 
showing evidence of adenoid cystic change. At low power (A and B), 
expression of WISP-1 is seen in the delicate branching fibrovascular 
tumor stroma (arrowhead). At higher magnification, expression is seen 
in the stromal(s) fibroblasts (C and D\ and tumor cells are negative. 
Focal expression of WISP-1, however, was observed in tumor cells in 
some areas. Images of WISP-2 expression are shown in E-H. At low 
power (E and F) y expression of WISP-2 is seen in cells lying within the 
fibrovascular tumor stroma. At higher magnification, these cells 
appeared to be adjacent to capillary vessels whereas tumor cells are 
negative (G and H). 



the predominant cell type expressing WISP-1 was the stromal 
fibroblasts. 

Chromosome Localization of the WISP Genes. The chro- 
mosomal location of the human WISP genes was determined 
by radiation hybrid mapping panels. WISP-1 is approximately 
3.48 cR from the meiotic marker AFM259xc5 [logarithm of 
odds (lod) score 16.31] on chromosome 8q24.1 to 8q24.3, in the 
same region as the human locus of the novH family member 
(27) and roughly 4 Mbs distal to c-myc (28). Preliminary fine 
mapping indicates that WISP-1 is located near D8S1712 STS. 
WISP-2 is linked to the marker SHGC-33922 (lod = 1,000) on 
chromosome 20ql2-20ql3.1. Human WISPS mapped to chro- 
mosome 6q22-6q23 and is linked to the marker AFM211ze5 
(lod = 1,000). WISPS is approximately 18 Mbs proximal to 
CTGF and 23 Mbs proximal to the human cellular oncogene 
MYB (27, 29). 

Amplification and Aberrant Expression of WISPs in Human 
Colon Tumors. Amplification of protooncogenes is seen in 
many human tumors and has etiological and prognostic sig- 
nificance. For example, in a variety of tumor types, c-myc 
amplification has been associated with malignant progression 
and poor prognosis (30). Because WISP-1 resides in the same 
general chromosomal location (8q24) as c-myc, we asked 
whether it was a target of gene amplification, and, if so, 
whether this amplification was independent of the c-myc locus. 
Genomic DNA from human colon cancer cell lines was 
assessed by quantitative PCR and Southern blot analysis. (Fig. 
5 A and B). Both methods detected similar degrees of WISP-1 
amplification. Most cell lines showed significant (2- to 4-fold) 
amplification, with the HT-29 and WiDr cell lines demonstrat- 
ing an 8-fold increase. Significantly, the pattern of amplifica- 
tion observed did not correlate with that observed for c-myc, 
indicating that the c-myc gene is not part of the amplicon that 
involves the WISP-1 locus. 

We next examined whether the WISP genes were amplified 
in a panel of 25 primary human colon adenocarcinomas. The 
relative WISP gene copy number in each colon tumor DNA 
was compared with pooled normal DNA from 10 donors by 
quantitative PCR (Fig. 6). The copy number of WISP-1 and 
WISP-2 was significantly greater than one, approximately 
2-fold for WISP-1 in about 60% of the tumors and 2- to 4-fold 
for WISP-2 in 92% of the tumors (P < 0.001 for each). The 
copy number for WISPS was indistinguishable from one (P - 
0.166). In addition, the copy number of WISP-2 was signifi- 
cantly higher than that of WISP-1 (P < 0.001). 

The levels of WISP transcripts in RNA isolated from 19 
adenocarcinomas and their matched normal mucosa were 
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Fig. 5. Amplification of WISP-1 genomic DNA in colon cancer cell 
lines. (A) Amplification in cell line DNA was determined by quanti- 
tative PCR. (B) Southern blots containing genomic DNA (10 jxg) 
digested with EcoKl (WISP-1) or Xbal (c-myc) were hybridized with 
a 100-bp human WISP-1 probe (amino acids 186-219) or a human 
c-myc probe (located at bp 1901-2000). The WISP and myc genes are 
detected in normal human genomic DNA after a longer film exposure. 
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Tumor Number 

Fig. 6. Genomic amplification of WISP genes in human colon 
tumors. The relative gene copy number of the WISP genes in 25 
adenocarcinomas was assayed by quantitative PCR, by comparing 
DNA from primary human tumors with pooled DNA from 10 healthy 
donors. The data are means ± SEM from one experiment done in 
triplicate. The experiment was repeated at least three times. 

assessed by quantitative PCR (Fig. 7). The level of WISP-1 
RNA present in tumor tissue varied but was significantly 
increased (2- to >25-fold) in 84% (16/19) of the human colon 
tumors examined compared with normal adjacent mucosa. 
Four of 19 tumors showed greater than 10-fold overexpression. 
In contrast, in 79% (15/19) of the rumors examined, WISP-2 
RNA expression was significantly lower in the tumor than the 
mucosa. Similar to WISP-1, WISP-3 RNA was overexpressed in 
63% (12/19) of the colon tumors compared with the normal 
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Fig. 7. WISP RNA expression in primary human colon tumors 
relative to expression in normal mucosa from the same patient. 
Expression of WISP mRNA in 19 adenocarcinomas was assayed by 
quantitative PCR. The Dukes stage of the tumor is listed under the 
sample number. The data are means ± SEM from one experiment 
done in triplicate. The experiment was repeated at least twice. 



mucosa. The amount of overexpression of WISPS ranged from 
4- to >40-fold. 



DISCUSSION 

One approach to understanding the molecular basis of cancer 
is to identify differences in gene expression between cancer 
cells and normal cells. Strategies based on assumptions that 
steady-state mRNA levels will differ between normal and 
malignant cells have been used to clone differentially ex- 
pressed genes (31). We have used a PCR-based selection 
strategy, SSH, to identify genes selectively expressed in 
C57MG mouse mammary epithelial cells transformed by 
Wnt-1. 

Three of the genes isolated, WISP-1, WISP-2, and WISP-3, 
are members of the CCN family of growth factors, which 
includes CTGF, Cyr61, and nov, a family not previously linked 
to Wnt signaling. 

Two independent experimental systems demonstrated that 
WISP induction was associated with the expression of Wnt-1. 
The first was C57MG cells infected with a Wnt-1 retroviral 
vector or C57MG cells expressing Wnt-1 under the control of 
a tetracyline-repressible promoter, and the second was in 
Wnt-1 transgenic mice, where breast tissue expresses Wnt-1, 
whereas normal breast tissue does not. No WISP RNA expres- 
sion was detected in mammary tumors induced by polyoma 
virus middle T antigen (data not shown). These data suggest 
a link between Wnt-1 and WISPs in that in these two situations, 
WISP induction was correlated with Wnt-1 expression. 

It is not clear whether the WISPs are directly or indirectly 
induced by the downstream components of the Wnt-1 signaling 
pathway (i.e., 0-catenin-TCF-l/Lefl). The increased levels of 
WISP RNA were measured in Wnt-l-transformed cells, hours 
or days after Wnt-1 transformation. Thus, WISP expression 
could result from Wnt-1 signaling directly through 0-catenin 
transcription factor regulation or alternatively through Wnt-1 
signaling turning on a transcription factor, which in turn 
regulates WISPs. 

The WISPs define an additional subfamily of the CCN family 
of growth factors. One striking difference observed in the 
protein sequence of WISP-2 is the absence of a CT domain, 
which is present in CTGF, Cyr61, nov, WISP-1, and WISP-3. 
This domain is thought to be involved in receptor binding and 
dimerization. Growth factors, such as TGF-/3, platelet-derived 
growth factor, and nerve growth factor, which contain a cystine 
knot motif exist as dimers (32). It is tempting to speculate that 
WISP-1 and WISP-3 may exist as dimers, whereas WISP-2 
exists as a monomer. If the CT domain is also important for 
receptor binding, WISP-2 may bind its receptor through a 
different region of the molecule than the other CCN family 
members. No specific receptors have been identified for CTGF 
or nov. A recent report has shown that integrin a v fe serves as 
an adhesion receptor for Cyr61 (33). 

The strong expression of WISP-1 and WISP-2 in cells lying 
within the fibrovascular tumor stroma in breast tumors from 
Wnt-1 transgenic animals is consistent with previous obser- 
vations that transcripts for the related CTGF gene are pri- 
marily expressed in the fibrous stroma of mammary tumors 
(34). Epithelial cells are thought to control the proliferation of 
connective tissue stroma in mammary tumors by a cascade of 
growth factor signals similar to that controlling connective 
tissue formation during wound repair. It has been proposed 
that mammary tumor cells or inflammatory cells at the tumor 
interstitial interface secrete TGF-/31, which is the stimulus for 
stromal proliferation (34). TGF-/31 is secreted by a large 
percentage of malignant breast tumors and may be one of the 
growth factors that stimulates the production of CTGF and 
WISPs in the stroma. 

It was of interest that WISP-1 and WISP-2 expression was 
observed in the stromal cells that surrounded the tumor cells 
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(epithelial cells) in the Wnt-1 transgenic mouse sections of 
breast tissue. This finding suggests that paracrine signaling 
could occur in which the stromal cells could supply WISP-1 and 
WISP-2 to regulate tumor cell growth on the WISP extracel- 
lular matrix. Stromal cell-derived factors in the extracellular 
matrix have been postulated to play a role in tumor cell 
migration and proliferation (35). The localization of WISP-1 
and WISP-2 in the stromal cells of breast tumors supports this 
paracrine model. 

An analysis of WISP-1 gene amplification and expression in 
human colon tumors showed a correlation between DNA 
amplification and overexpression, whereas overexpression of 
WISP-3 RNA was seen in the absence of DNA amplification. 
In contrast, WISP-2 DNA was amplified in the colon tumors, 
but its mRNA expression was significantly reduced in the 
majority of tumors compared with the expression in normal 
colonic mucosa from the same patient. The gene for human 
WISP-2 was localized to chromosome 20ql2-20ql3, at a region 
frequently amplified and associated with poor prognosis in 
node negative breast cancer and many colon cancers, suggest- 
ing the existence of one or more oncogenes at this locus 
(36-38). Because the center of the 20ql3 amplicon has not yet 
been identified, it is possible that the apparent amplification 
observed for WISP-2 may be caused by another gene in this 
amplicon. 

A recent manuscript on rCop-1, the rat orthologue of 
WISP-2 , describes the loss of expression of this gene after cell 
transformation, suggesting it may be a negative regulator of 
growth in cell lines (16). Although the mechanism by which 
WISP-2 RNA expression is down-regulated during malignant 
transformation is unknown, the reduced expression of WISP-2 
in colon tumors and cell lines suggests that it may function as 
a tumor suppressor. These results show that the WISP genes 
are aberrantly expressed in colon cancer and suggest that their 
altered expression may confer selective growth advantage to 
the tumor. 

Members of the Wnt signaling pathway have been impli- 
cated in the pathogenesis of colon cancer, breast cancer, and 
melanoma, including the tumor suppressor gene adenomatous 
polyposis coli and j3-catenin (39). Mutations in specific regions 
of either gene can cause the stabilization and accumulation of 
cytoplasmic /3-catenin, which presumably contributes to hu- 
man carcinogenesis through the activation of target genes such 
as the WISPs. Although the mechanism by which Wnt-1 
transforms cells and induces tumorigenesis is unknown, the 
identification of WISPs as genes that may be regulated down- 
stream of Wnt-1 in C57MG cells suggests they could be 
important mediators of Wnt-1 transformation. The amplifica- 
tion and altered expression patterns of the WISPs in human 
colon tumors may indicate an important role for these genes 
in tumor development. 
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Aneuploidy and cancer 

Subrata Sen, PhD 



Numeric aberrations in chromosomes, referred to as aneu- 
ploidy, is commonly observed in human cancer. Whether aneu- 
ploidy is a cause or consequence of cancer has long been 
debated. Three lines of evidence now make a compelling case 
for aneuploidy being a discrete chromosome mutation event 
that contributes to malignant transformation and progression 
process. First, precise assay of chromosome aneuploidy in 
several primary tumors with in situ hybridization and compara- 
tive genomic hybridization techniques have revealed that 
specific chromosome aneusomies correlate with distinct tumor 
phenotypes. Second, aneuploid tumor cell lines and in vitro 
transformed rodent cells have been reported to display an 
elevated rate of chromosome instability, thereby indicating that 
aneuploidy is a dynamic chromosome mutation event associ- 
ated with transformation of cells. Third, and most important, a 
number of mitotic genes regulating chromosome segregation 
have been found mutated in human cancer cells, implicating 
such mutations in induction of aneuploidy in tumors. Some of 
these gene mutations, possibly allowing unequal segregations 
of chromosomes, also cause tumorigenic transformation of 
cells in vitro. In this review, the recent publications investigat- 
ing aneuploidy in human cancers, rate of chromosome instabil- 
ity in aneuploidy tumor cells, and genes implicated in regulat- 
ing chromosome segregation found mutated in cancer cells 

are discussed. Curr Opin Oncol 2000, 12:82-88 © 2000 Lippincott Williams 
& Wilkins, Inc. 
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Cancer research over the past decade has firmly estab- 
lished that malignant cells accumulate a large number of 
genetic mutations that affect differentiation, prolifera- 
tion, and cell death processes. In addition, it is also 
recognized that most cancers are clonal, although they 
display extensive heterogeneity with respect to kary- 
otypes and phenotypes of individual clonal populations. 
It is estimated that numeric chromosomal imbalance, 
referred to as aneuploidy^ is the most prevalent genetic 
change recorded among over 20,000 solid tumors 
analyzed thus far Phenotypic diversity of the clonal 
populations in individual tumors involve differences in 
morphology, proliferative properties, antigen expression, 
drug sensitivity, and metastatic potentials. It has been 
proposed that an underlying acquired genetic instability 
is responsible for the multiple mutations detected in 
cancer cells that lead to tumor heterogeneity and 
progression [2]. In a somewhat contradictory argument, 
it has also been suggested that clonal expansion due to 
selection of cells undergoing normal rates of mutation 
can explain malignant transformation and progression 
process in humans [3]. Acquired genetic instability, 
nonetheless, is considered important for more rapid 
progression of the disease f4»*J. Although the original 
hypothesis on genetic instability in cancer primarily 
focused on chromosome imbalances in the form of aneu- 
ploidy in tumor cells, the actual relevance of such muta- 
tions in cancer remains a controversial issue. 

Whether or not aneuploidy contributes to the malignant 
transformation and progression process has long been 
debated. A prevalent idea on genetics of cancer referred 
to as "somatic gene mutation hypothesis" contends that 
gene mutations at the nucleotide level alone can cause 
cancer by either activating cellular proto- oncogenes to 
dominant cancer causing oncogenes and/or by inactivat- 
ing growth inhibitory tumor suppressor genes. In this 
scheme of things chromosomal instability in the form of 
aneuploidy is a mere consequence rather than a cause of 
malignant transformation and progression process. 

In this review, some of the recent observations on the 
subject are discussed and compelling evidence is 
provided to suggest that aneuploidy is a distinct form of 
genetic instability in cancer that frequently correlates 
with specific phenotypes and stages of the disease. 
Furthermore, discrete genetic targets affecting chromo- 
somal stability in cancer cells, recently identified, are 
also discussed. These data provide a new direction 
toward elucidating the molecular mechanisms responsi- 
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ble for induction of aneuploidy in cancer and may even- 
tually be exploited as novel therapeutic targets in the 
future. 

Genetic alterations in cancer 

Alterations in many genetic loci regulating growth, 
senescence, and apoptosis, identified in tumor cells, 
have led to the current understanding of cancer as a 
genetic disease. The genetic changes identified in 
tumors include: subtle mutations in genes at the 
nucleotide level; chromosomal translocations leading to 
structural rearrangements in genes; and numeric 
changes in either partial segments of chromosomes or 
whole chromosomes (aneuploidy) causing imbalance in 
gene dosage. 

For the purpose of this review, both segmental and whole 
chromosome imbalances leading to altered DNA dosage 
in cancer cells are included as examples of aneuploidy. 

Incidence of aneuploidy in cancer 

Evidence of aneuploidy involving one or more chromo- 
somes have been commonly reported in human tumors. 
Although these observations were initially made using 
classic cytogenetic techniques late in a tumor's evolu- 
tion and were difficult to correlate with cancer progres- 
sion, more recent studies have reported association of 
specific nonrandom chromosome aneuploidy with 
different biologic properties such as loss of hormone 
dependence and metastatic potential [5J. 

Classic cytogenetic studies performed on tumor cells 
had serious limitations in scope because they were 
applicable only to those cases in which mitotic chromo- 
somes could be obtained. Because of low spontaneous 
rates of cell division in primary tumors, analyses 
depended on cells either derived selectively from 
advanced metastases or those grown in vitro for variable 
periods of time. In both instances, metaphases analyzed 
represented only a subset of primary tumor cell popula- 
tion. Two major advances in cytogenetic analytic tech- 
niques, in situ hybridization (ISH) and comparative 
genomic hybridization (CGH), have allowed better reso- 
lution of chromosomal aberrations in freshly isolated 
tumor cells [6]. ISH analyses with chromosome-specific 
DNA probes, a powerful adjunct to metaphasic analysis, 
allows assessment of chromosomal anomalies within 
tumor cell populations in the contexts of whole nuclear 
architecture and tissue organization. CGH allows 
genome wide screening of chromosomal anomalies 
without the use of specific probes even in the absence 
of prior knowledge of chromosomes involved. Although 
both techniques have certain limitations in terms of 
their resolution power, they nonetheless provide a 
better approximation of chromosomal changes occurring 
among tumors of various histology, grade, and stage 



compared with what was possible with the classic cyto- 
genetic techniques. Genomic ploidy measurements 
have also been performed at the DNA level with flow 
cytometry and cytofluorometric methods. Although 
these assays underestimate chromosome ploidy due to a 
chromosomal gain occasionally masking a chromosomal 
loss in the same cell, several studies using these 
methods have supported the conclusion that DNA 
aneuploidy closely associates with poor prognosis in 
various cancers [7,8]. This discussion of some recent 
examples published on aneuploidy in cancer includes 
discussion of studies dealing with DNA ploidy measure- 
ments as well. Most of these observations are correlative 
without direct proof of specific involvement of genes on 
the respective chromosomes. Identification of putative 
oncogenes and tumor suppressor genes on gained and 
lost chromosomes in aneuploid tumors, however, are 
providing strong evidence that chromosomes involved in 
aneuploidy play a critical role in the tumorigenic 
process. 

In renal tumors, either segmental or whole chromosome 
aneuploidy appears to be uniquely associated with 
specific histologic subtypes [9]. Tumors from patients 
with hereditary papillary renal carcinomas (HPRC) 
commonly show trisomy of chromosome 7, when 
analyzed by CGH. Germline mutations of a putative 
oncogene MET have been detected in patients with 
HPRC. A recent study [10] has demonstrated that an 
extra copy of chromosome 7 results in nonrandom dupli- 
cation of the mutant MET allele in HPRC, thereby 
implicating this trisomy in tumorigenesis. The study 
suggested that mutation of MET may render the cells 
more susceptible to errors in chromosome replication, 
and that clonal expansion of cells harboring duplicated 
chromosome 7 reflects their proliferative advantage. In 
addition to chromosome 7, trisomy of chromosome 17 in 
papillary tumors and also of chromosome 8 in mesoblas- 
tic nephroma are commonly seen. Association of specific 
chromosome imbalances with benign and malignant 
forms of papillary renal tumors, therefore, not only 
contribute to an understanding of tumor origins and 
evolution, but also implicate aneuploidy of the respec- 
tive chromosomes in the tumorigenic transformation 
process. 

In colorectal tumors, chromosome aneuploidy is a 
common occurrence. In fact, molecular allelotyping 
studies have suggested that limited karyotyping data 
available from these tumors actually underestimate the 
true extent of these changes. Losses of heterozygosity 
reflecting loss of the maternal or paternal allele in 
tumors are widespread and often accompanied by a gain 
of the opposite allele. Therefore, for example, a tumor 
could lose a maternal chromosome while duplicating 
the same paternal chromosome, leaving the tumor cell 
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with a normal karyotype and ploidy but an aberrant 
allelotype. It has been estimated that cancer of the 
colon, breast, pancreas, or prostate may lose an average 
of 25% of its alleles. It is not unusual to discover that a 
tumor has lost over half of its alleles [4], In clinical 
settings, DNA ploidy measurements have revealed that 
DNA aneuploidy indicates high risk of developing 
severe premalignant changes in patients with ulcerative 
colitis, who are known to have an increased risk of 
developing colorectal cancer [11]. DNA aneuploidy has 
been found to be one of the useful indicators of lymph 
node metastasis in patients with gastric carcinoma and 
associated with poor outcome compared with diploid 
cases [12,13]. CGH analyses of chromosome aneu- 
ploidy, on the other hand, was reported to correlate gain 
of chromosome 20q with high tumor S phase fractions 
and loss of 4q with low tumor apoptotic indices [14]. 
Aneuploidy of chromosome 4 in metastatic colorectal 
cancer has recently been confirmed in studies that used 
unbiased DNA fingerprinting with arbitrarily primed 
polymerase chain reactions to detect moderate gains 
and losses of specific chromosomal DNA sequences 
[15]. The molecular karyotype (amplotype) generated 
from colorectal cancer revealed that moderate gains of 
sequences from chromosomes 8 and 13 occurred in 
most tumors, suggesting that overrepresentation of 
these chromosomal regions is a critical step for metasta- 
tic colorectal cancer. 

In addition to being implicated in tumorigenesis and 
correlated with distinct tumor phenotypes, chromosome 
aneuploidy has been used as a marker of risk assessment 
and prognosis in several other cancers. The potential 
value of aneuploidy as a noninvasive tool to identify 
individuals at high risk of developing head and neck 
cancer appears especially promising. Interphase fluores- 
cence in situ hybridization (FISH) revealed extensive 
aneuploidy in tumors from patients with head and neck 
squamous cell carcinomas (HNSCC) and also in clini- 
cally normal distant oral regions from the same individu- 
als [16,17]. It has been proposed that a panel of chromo- 
some probes in FISH analyses may serve as an 
important tool to detect subclinical tumorigenesis and 
for diagnosis of residual disease. The presence of aneu- 
ploid or tetraploid populations is seen in 90% to 95% of 
esophageal adenocarcinomas, and when seen in 
conjunction with Barrett's esophagus, a premalignant 
condition, predicts progression of disease [18,19]. 
Chromosome ploidy analyses in conjunction with loss of 
heterozygosity and gene mutation studies in Barrett's 
esophagus reflect evolution of neoplastic cell lineages in 
vivo [20]. Evolution of neoplastic progeny from Barrett's 
esophagus following somatic genetic mutations 
frequently involves bifurcations and loss of heterozygos- 
ity at several chromosomal loci leading to aneuploidy 
and cancer. Accordingly, it is hypothesized that during 



tumor cell evolution diploid cell progenitors with 
somatic genetic abnormalities undergo expansion with 
acquired genetic instability. Such instability, often 
manifested in the form of increased incidence of aneu- 
ploidy, enters a phase of clonal evolution beginning in 
premalignant cells that proceeds over a period of time 
and occasionally leads to malignant transformation. The 
clonal evolution continues even after the emergence of 
cancer. 

The significance of DNA and chromosome aneu- 
ploidy in other human cancers continue to be evalu- 
ated. Among papillary thyroid carcinomas, aneuploid 
DNA content in tumor cells was reported to correlate 
with distant metastases, reflecting worsened progno- 
sis [21]. Genome wide screening of follicular thyroid 
tumors by CGH, on the other hand, revealed frequent 
loss of chromosome 22 in widely invasive follicular 
carcinomas [22]. Chromosome copy number gains in 
invasive neoplasm compared with foci of ductal carci- 
noma in situ (DCIS) with similar histology have been 
proposed to indicate involvement of aneuploidy in 
progression of human breast cancer [23]. ISH analyses 
of cervical intraepithelial neoplasia has provided 
suggestive evidence that chromosomes 1, 7 and X 
aneusomy is associated with progression toward cervi- 
cal carcinoma [24]. 

Although the prognostic value of numeric aberrations 
remains a matter of debate in human hematopoietic 
neoplasia, there have been recent studies to suggest that 
the presence of monosomy 7 defines a distinct subgroup 
of acute myeloid leukemia patients [25]. It is interesting 
in this context that therapy-related myelodysplastic 
syndromes have been reported to display monosomy 5 
and 7 karyotypes, reflecting poor prognosis [26]. 

The clinical observations, mentioned previously, are 
supported by in vitro studies in human and rodent cells in 
which aneuploidy is induced at early stages of transforma- 
tion [27,28]. It is even suggested that aneuploidy may 
cause cell immortalization, in some instances, that is a 
critical step preceeding transformation. 

Finally, in an interesting study to develop transgenic 
mouse models of human chromosomal diseases, chromo- 
some segment specific duplication and deletions of the 
genome were reported to be constructed in mouse 
embryonic stem cells [29]. Three duplications for a 
portion of mouse chromosome 11 syntenic with human 
chromosome 17 were established in the mouse 
germline. Mice with 1Mb duplication developed corneal 
hyperplasia and thymic tumors. The findings represent 
the first transgenic mouse model of aneuploidy of a 
defined chromosome segment that documents the direct 
role of chromosome aneusomy in tumorigenesis. 



Aneuploidy as "dynamic cancer-causing 
mutation" instead of a "consequential state" 
in cancer 

According to the hypothesis previously discussed, aneu- 
ploidy represents either a "gain of function" or "loss of 
function" mutation at the chromosome level with a 
causative influence on the tumorigenesis process. The 
hypothesis, however, is based only on circumstantial 
evidence even though existence of aneuploidy is corre- 
lated with different tumor phenotypes. The existence of 
numeric chromosomal alterations in a tumor does not 
mean that the change arose as a dynamic mutation due 
to genomic instability, because several factors could lead 
to consequential aneuploidy in tumors, also. Although 
aneuploidy as a dynamic mutation due to genomic insta- 
bility in tumor cells would occur at a certain measurable 
rate per cell generation, a consequential state of aneu- 
ploidy in tumors may not occur at a predictable rate 
under similar conditions or in tumors with similar 
phenotypes. In addition to genomic instability, differ- 
ences in environmental factors with selective pressure, 
could explain high incidence of aneuploidy and other 
somatic mutations in tumors compared with normal cells 
[4]. These include humoral, cell substratum, and cell- 
cell interaction differences between tumor and normal 
cell environments. It could be argued that despite 
similar rates of spontaneous aneuploidy induction in 
normal and tumor cells, the latter are selected to prolif- 
erate due to altered selective pressure in the tumor cell 
environment, whereas the normal cells are eliminated 
through activation of apoptosis. Alternatively, of course, 
one could postulate that selective expression or overex- 
pression of anti-apoptotic proteins or inactivation of 
proapoptotic proteins in tumor cells may counteract 
default induction of apoptosis in G2/M phase cells 
undergoing missegregation of chromosomes. Recent 
demonstration of overexpression of a G2/M phase anti- 
apoptotic protein survivin in cancer cells [30] suggests 
that this protein may favor aberrant progression of aneu- 
ploid transformed cells through mitosis. This would 
then lead to proliferation of aneuploid cell lineages, 
which may undergo clonal evolution. 

To ascertain that aneuploidy is a dynamic mutational 
event, various human tumor cell lines and transformed 
rodent cell lines have been analyzed for the rate of 
aneuploidy induction. When grown under controlled in 
vitro conditions, such conditions ensure that environ- 
mental factors do not influence selective proliferation of 
cells with chromosome instability. In one study, 
Lengauer et ai [31»] provided unequivocal evidence by 
FISH analyses that losses or gains of multiple chromo- 
somes occurred in excess of 10* 2 per chromosome per 
generation in aneuploid colorectal cancer cell lines. The 
study further concluded that such chromosomal instabil- 
ity appeared to be a dominant trait. Using another in 
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vitro model system of Chinese hamster embryo (CHE) 
cells, Duesberg et ai [32 # ] have also obtained similar 
results. With clonal cultures of CHE cells, transformed 
with nongcnotoxic chemicals and a mitotic inhibitor, 
these authors demonstrated that the overwhelming 
majority of the transformed colonies contained more 
than 50% aneuploid cells, indicating that aneuploidy 
would have originated from the same cells that under- 
went transformation. All the transformed colonies tested 
were tumorigenic. It was further documented that the 
ploidy factor representing the quotient of the modal 
chromosome number divided by the normal diploid 
number, in each clone, correlated directly with the 
degree of chromosomal instability. Therefore, chromo- 
somal instability was found proportional to the degree of 
aneuploidy in the transformed cells and the authors 
hypothesized that aneuploidy is a unique mechanism of 
simultaneously altering and destabilizing, in a massive 
manner, the normal cellular phenotypes. In the absence 
of any evidence that the transforming chemicals used in 
the study did not induce other somatic mutations, it is 
difficult to rule out the contribution of such mutations 
in the transformation process. These results nonetheless 
make a strong case for aneuploidy being a dynamic chro- 
mosome mutation event intimately associated with 
cancer. 

Aneuploidy versus somatic gene mutation in 
cancer 

The idea that numeric chromosome imbalance or aneu- 
ploidy is a direct cause of cancer was proposed at the 
turn of the century by Theodore Boveri [33]. However, 
the hypothesis was largely ignored over the last several 
decades in favor of the somatic gene mutation hypothe- 
sis, mentioned earlier. Evidence accumulating in the 
literature lately on specific chromosome aneusomies 
recognized in primary tumors, incidence of aneuploidy 
in cells undergoing transformation, and aneuploid tumor 
cells showing a high rate of chromosome instability have 
led to the rejuvenation of Boveri's hypothesis. The 
concept has recently been discussed as a "vintage wine 
in a new bottle" [34*]. The author points out that 
except for rare cancers caused by dominant retroviral 
oncogenes, diploidy does not seem to occur in solid 
tumors, whereas aneuploidy is a rule rather than excep- 
tion in cancer. 

Aneuploidy as an effective mutagenic mechanism 
driving tumor progression, on the other hand, is being 
recognized as a viable solution to the paradox that with 
known mutation rate in non-germline cells (~10~ 7 per 
gene per cell generation) tumor cell lineages cannot 
accumulate enough mutant genes during a human life- 
time [35]. The concept is gaining significant credibility 
since genes that potentially affect chromosome segrega- 
tion were found mutated in human cancer. Some of 



86 Cancer biology 



these genes have also been shown to have transforming 
capability in in vitro assays. Selected recent publications 
describing the findings are being discussed below in 
reference to the mitotic targets potentially involved in 
inducing chromosome segregation anomalies in cells. 

Potential mitotic targets and molecular 
mechanisms of aneuploidy 

Because aneuploidy represents numeric imbalance in 
chromosomes, it is reasonable to expect that aneuploidy 
arises due to missegregation of chromosomes during cell 
division. There are many potential mitotic targets, 
which could cause unequal segregation of chromosomes 
(Fig. 1). Recent investigations have identified several 
genes involved in regulating these mitotic targets and 
mitotic checkpoint functions, which can be implicated 
in induction of aneuploidy in tumor cells. This discus- 
sion is restricted to those mitotic targets and checkpoint 
genes whose abnormal functioning has been observed in 
cancer or has been shown to cause tumorigenic transfor- 
mation of cells, in recent years. The role of telomeres is 
discussed elsewhere in this issue. For a more detailed 
description of the components of mitotic machinery and 
their possible involvement in causing chromosome 
segregation abnormalities in tumor cells, readers may 
refer to a recently published review [36»]. 

Among the mitotic targets implicated in cancer, centro- 
some defects have been observed in a wide variety of 
malignant human tumors. Centrosomes play a central role 
in organizing the microtubule network in interphase cells 
and mitotic spindle during cell division. Multipolar 
mitotic spindles have been observed in human cancers in 
situ and abnormalities in the form of supernumerary 



Figure 1. Potential mitotic targets causing aneuploidy in 
oncogenesis 
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Diagram illustrates that defects in several processes involving chromosomal, 
spindle microtubule, and centrosomal targets, in addition to abnormal cytokine* 

sis, may cause unequal partitioning of chromosomes during mitosis, leading to 
aneuploidy. Recently obtained evidence in favor of some of these possibilities is 
discussed in the text. 



centrosomes, centrosomes of aberrant size and shape as 
well as aberrant phosphorylation of centrosome proteins 
have been reported in prostate, colon, brain, and breast 
tumors [37,38]. In view of the findings that abnormal 
centrosomes retain the ability to nucleate microtubules in 
vitro > it is conceivable that cells with abnormal centro- 
somes may missegregate chromosomes producing aneu- 
ploid cells. The molecular and genetic bases of abnormal 
centrosome generation and the precise pathway through 
which they regulate the chromosome segregation process 
remain to be elucidated. Recent discovery of a centro- 
some-associated kinase STK15/BTAK/aurora2, naturally 
amplified and overexpressed in human cancers, has raised 
the interesting possibility that aberrant expression of this 
kinase is critically involved in abnormal centrosome func- 
tion and unequal chromosome segregation in tumor cells 
[39,40]. Exogenous expression of the kinase in rodent and 
human cells was found to correlate with an abnormal 
number of centrosomes, unequal partitioning of chromo- 
somes during division, and tumorigenic transformation of 
cells. It is relevant in this context to mention that the 
Xenopus homologue of human STK 1 5/BTAK/aurora2 
kinase has recently been shown to phosphorylate a micro- 
tubule motor protein XlEg5, the human orthologue of 
which is known to participate in the centrosome separa- 
tion during mitosis [41]. Findings on STK15/aurora2 
kinase, thus, provide an interesting lead to a possible 
molecular mechanism of centrosome's role in oncogene- 
sis. Centrosomes have, of late, been implicated in onco- 
genesis from studies revealing supernumerary centro- 
somes in />5J-deficient fibroblasts and overexpression of 
another centrosome kinase PLK1 being detected in 
human non-small cell lung cancer [42]. 

One of the critical events that ensures equal partition- 
ing of the chromosomes during mitosis is the proper 
and timely separation of sister chromatids that are 
attached to each other and to the mitotic spindle. 
Untimely separation of sister chromatids has been 
suspected as a cause of aneuploidy in human tumors. 
Cohesion between sister chromatids is established 
during replication of chromosomes and is retained until 
the next metaphase/anaphase transition. It has been 
shown that during metaphase-anaphase transition, the 
anaphase promoting complex/cyclosome triggers the 
degradation of a group of proteins called securins that 
inhibit sister chromatid separation. A vertebrate securin 
(v-securin) has recently been identified that inhibits 
sister chromatid separation and is involved in transfor- 
mation and tumorigenesis. Subsequent analysis 
revealed that the human securin is identical to the 
product of the gene called pituitary tumor transforming 
gene, which is overexpressed in some tumors and 

exhibits transforming activity in NIH3T3 cells. It is 
proposed that elevated expression of the v-securin may 
contribute to generation of malignant tumors due to 
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chromosome gain or loss produced by errors in chro- 
matid separation |43«]. 

Normal progression through mitosis during prophase to 
anaphase transition is monitored at least at two check- 
points. One checkpoint operates during early prophase 
at G2 to metaphase progression while the second 
ensures proper segregation of chromosomes during 
metaphase to anaphase transition. Several mitotic 
checkpoint genes responding to mitotic spindle defects 
have been identified in yeast. The metaphase-anaphase 
transition is delayed following activation of this check- 
point during which kinetochores remain unattached to 
the spindle. The signal is transmitted through a kineto- 
chore protein complex consisting of Mpslp and several 
Mad and Bub proteins [44]. It is expected that for 
unequal chromosome segregation to be perpetuated 
through cell proliferation cycles giving rise to aneu- 
ploidy, checkpoint controls have to be abrogated. 

Following this logic, Vogelstein et al [45«1 hypothesized 
that aneuploid tumors would reveal mutation in mitotic 
spindle checkpoint genes. Subsequent studies by these 
investigators have proven the validity of this hypothesis 
and a small fraction of human colorectal cancers have 
revealed the presence of mutations in either hBubl or 
hBubRl checkpoint genes. It was further revealed that 
mutant BUB1 could function in a dominant negative 
manner conferring an abnormal spindle checkpoint 
when expressed exogenously. Inactivation of spindle 
checkpoint function in virally induced leukemia has also 
recently been documented following the finding that 
hMADl checkpoint protein is targeted by the Tax 
protein of the human T-cell leukemia virus type 1. 
Abrogation of hMADl function leads to rnultinucleation 
and aneuploidy [46], 

In addition to mitotic spindle checkpoint defects, failed 
DNA damage checkpoint function in yeast is frequently 
associated with aberrant chromosome segregation as 
well. It, therefore, appears intriguing yet relevant that 
the human BRCA1 gene, proposed to be involved in 
DNA damage checkpoint function, when mutated by a 
targeted deletion of exon 1 1 led to defective G2/M cell 
cycle checkpoint function and genetic instability in 
mouse embryonic fibroblasts [47]. The cells revealed 
multiple functional centrosomes and unequal chromo- 
some segregation and aneuploidy. Although the molecu- 
lar basis for these abnormalities is not known at this 
time, it raises the interesting possibility that such an 
aneuploidy-driven mechanism may be involved in 
tumorigenesis in individuals carrying germline muta- 
tions of BRCA1 gene. 



Conclusion 

Growing evidence from human tumor cytogenetic inves- 
tigations strongly suggest that aneuploidy is associated 
with the development of tumor phenotypes. Clinical 
findings of correlation between aneuploidy and tumori- 
genesis are supported by studies with in vitro grown 
transformed cell lines. Molecular genetic analyses of 
tumor cells provide credible evidence that mutations in 
genes controlling chromosome segregation during 
mitosis play a critical role in causing chromosome insta- 
bility leading to aneuploidy in cancer. Further elucida- 
tion of molecular and physiologic bases of chromosome 
instability and aneuploidy induction could lead to the 
development of new therapeutic approaches for 
common forms of cancer. 
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High-throughput technologies, such as proteomic screening and DNA micro-arrays, produce vast 
amounts of data requiring comprehensive analytical methods to decipher the biologically relevant 
results. One approach would be to manually search the biomedical literature; however, this would be 
an arduous task. We developed an automated llterature-mfning tool r termed MedGene, which 
comprehensively summarizes and estimates the relative strengths of all human gene-disease 
relationships in Medline. Using MedGene. we analyzed a novel micro-array expression dataset 
comparing breast cancer and normal breast tissue in the context of existing knowledge. We found no 
correlation between the strength of the literature association and the magnitude of the difference in 
expression level when considering changes as high as S-fotd; however, a significant correlation was 
observed (r = 0.41; p = 0.05) among genes showing an expression difference of 10-fold or more. 
Interestingly, this only held true for estrogen receptor (ER) positive tumors, not ER negative. MedGene 
identified a set of relatively understudied, yet highly expressed genes In ER negative tumors worthy of 
further examination. 
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Introduction 

At its current pace, the accumulation of biomedical literature 
outpaces the ability of most researchers and clinicians to stay 
abreast of their own immediate fields, let alone cover a broader 
range of topics. For example, to follow a single disease. e.g., 
breast cancer, a researcher would have had to scan 130 different 
journals and read 27 papers per day In 199D. 1 This problem Is 
accentuated with high- throughput technologies such as DNA 
micro-arrays and protcomics. which require the analysis of 
large datascts involving thousands of genes, many of which are 
unfamiliar to a particular researcher. In any micro array experi- 
ment, thousands of genes may demonstrate statistically sig- 
nificant expression changes, but only a fraction of these may 
be relevant to the study. The ability to interpret these dalasets 
would be enhanced if they could be compared to a compre- 
hensive summary of what is known about all genes. Thus, (here 
is a need to summarize existing knowledge In a format that 
allows for the rapid analysis of associations between genes and 
diseases or other specific biological concepts. 

One solution to (his problem is lo compile structured digital 
resources, such as die Breast Cancer Gene Database 1 and the 
Tumor Gene Database. 2 However, as these resources are hand- 
curatcd. the labor-intensive review process becomes a rate- 
limiting step in the growth of the database. As a result, these 
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databases have a limited scale and the genes arc not selected 
in a systematic fashion. 

An alternative approach is automated text mining; a method 
which involves automated Information extraction by searching 
documents for text strings and analyzing their frequency and 
context. This approach has been used successfully In several 
instances for biological applications. In most cases, it lias been 
applied to extract information about the relationships or 
interactions that proteins or genes have with one another, In 
the literature or by functional annotation. 3 " 7 Thus far, few 
publication have applied text-mining to examine the global 
relationships between genes and diseases. Perez-Ira txeta et al. 
automatically examined the CO (Gene Ontology) annotation 
of genes and (heir predicted chromosomal locations In order 
to identify genes linked to Inherited disorders. 8 

To obtain a more global understanding of disease develop- 
ment, it would be valuable to incorporate information regarding 
a]) possible gene-disease relationships, including biochemical, 
physiological, pharmacological, epidemiological, as well as 
genetic. This information would enable comprehensive com- 
parisons between large experimental datascts and existing 
knowledge in the literature. Tills would accomplish two things. 
First, It would serve to validate experiments by demonstrating 
dial known responses occur as predicted. Second, it would 
rapidly highlight wlilch genes arc corroborated by the literature 
and which genes are novel in a given context. We liave utilized 
a computational approach to literature mining to produce a 
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comprehensive set of gene-disease relationships. In addition, 
we have developed a novel approach to assess the strength of 
each association based on the frequency of citation and co- 
citation. We applied this tool to help interpret the data from a 
large micro-array gene expression experiment comparing 
normal and cancerous breast tissue. 

Methods 

McdGenc Database. MedGene is a relational database, stor- 
ing disease and gene information from NCBI, text mining re- 
sults, statistical scores, and hyperlinks to the primary lit- 
erature. MedGene has a web-based user interface for users to 
query the database Oittpy/hipseqjriediwivaid.(du/MedC^c/). 

Text Mining Algorithms. MeSH Dies were downloaded from 
the MeSH website at NLM (Nation Library of Medicine) (http:// 
www.nlmjiih.gov/mesh/meshhame.hunl) and human disease 
categories were selected. LocusLlnk flies were downloaded from 
the LocusLink web site at NCBI (htip://www. ncbl.nih.gov/ 
LocusLlnk/). Official/ preferred gene symbol, official/preferred 
gene name, and gene alternative symbols and names, all 
relevant annotations and URLs for each LocusLlnk record, were 
collected. Gene search terms were used for literature searching 
and included all qualified gene names, gene symbols, and gene 
family terms. Primary gene keys, predominantly qualified gene 
family terms and gene official/preferred symbols, were used 
to Index Medline records. If the official/preferred gene symbols 
did not meet the standards to be an index, then qualified gene 
official/preferred names were used. A local copy of Medline 
records (up to July, 2002) was prc-selcctcd. 

A JAVA module examined the MeSH terms and then indexed 
each Medline record with the appropriate disease terms. A 
separate JAVA module was used to examine the tides and 
abstracts for gene search terms and then to Index the gene- 
related Medline records with the relevant primary gene kcy(s). 

Statistical Methods. For every gene and disease pair, we 
counted records that were Indexed for both gene and disease 
(double positive hits), for disease only (disease single hits), for 
gene only (gene single hits), and for neither gene nor disease 
(double negative hits) to generate a 2 x 2 contingency table. 
On the basis of the contingency table-framework, we applied 
different statistical methods to estimate the strength or gene- 
disease relationships and evaluated the results. These methods 
included chi- square analysis. Fisher s exact probabilities, rela- 
tive risk of gene, and relative risk of disease 16 (http:// 
hipseq.med.harvard.edu/McdCcne/). In addition, we computed 
Uie 'product of frequency", which is Ute product of the 
proportion of disease/gene double hits to disease single hits 
and the proportion of disease/gene double hits to gene single 
hits. To obtain a normal distribution, we transformed all the 
statistical scores using the natural logaritiun. We selected the 
tog of the product of frequency (LPF) to validate MedGene and 
to use for the analysts with the micro-array data. Speannan 
rank-correlation coefficients were used to assess the linear 
relationship between LPF and micro-array fold change in 
expression level. 

Global Analysis. Diseases with at least 50 related genes were 
selected for clustering analysis, and die LPF scottm were 
normalized wJUi total score for each disease. Hierarchical 
clustering was done with the 'Cluster" software and the 
clustering result was visualized using "TrccViewcr" (http:// 
rana.lbl.gov/ElsenSoftware.htm). 
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Breast Tissue Micro-Arrays. Eighty-nine breast cancer 
samples (79% Expositive) and 7 normal breast tissue samples 
were selected from the Harvard Breast SPORE frozen tissue 
repository and were representative of the spectrum of histo- 
logical types, grades, and hormone receptor immuno-pheno- 
types of breast cancer. Biotinylated cRNA. generated from the 
total RNA extracted from the bulk tumor, was hybridized to 
Affymetrix U95A oligo-nudeotlde micro-arrays. These micro- 
arrays consist of 12 400 probes, which represent approximately 
9000 genes. Raw expression values were obtained using GENE- 
CHIP software from Affymetrix, and then further analyzed using 
the DNA-Chip Analyzer (dChip) custom software. 

Results 

Automated Indexing of Medline Records by Disease and 
Gene. To study the gene-disease associations in the literature, 
we first compiled complete lists for human diseases and human 
genes. To index all Medline records that were relevant to 
human diseases, the Medical Subject Heading (MeSH) index 
of Medline records was utilized. MeSH Is a controlled medical 
vocabulary from the National Library of Medicine and consists 
of a set of terms or subject headings that are arranged In both 
an alphabetic and an hierarchical structure. Medline records 
are reviewed manually and MeSH terms are added to each with 
software assistance. 0 ,0 Twenty-three human disease category 
headings along with all of their child terms (see the Supporting 
Information. Supplemental Table 1, or visit http://hlpseq. 
med Jiarvard.edu/MedGene/pu blication/sJTable Lhtml) were 
selected from the 2002 MeSH index creating a list of 4033 
human diseases. 

No Index comparable to the MeSH Index exists for genes, 
and thus. It was necessary to apply a string search algorithm 
for gene names or symbols found In Medline text A complete 
list of genes, gene names, gene symbols, and frequently used 
synonyms were collected from the LocusLink database at 
NCBI. llJJ which contains 53 259 independent records keyed 
by an official gene symbol or name (June 18 th , 2002). For the 
purposes of this study, no distinction was made between genes 
and their gene products. Authors often use the same name for 
both, differentiating the two only by the use of italics, if at all. 
For the intended use of this study, this lack of distinction is 
unlikely to have a large effect and may In fact be beneficial. 

Initial attempts to search the literature using these lists 
revealed several sources of false positives and false negatives 
(Table 1). False positives primarily arose when the searched 
term had other meanings, whereas false negatives arose from 
syntax discrepancies necessitating the development of Alters 
to reduce these errors. The syntax Issues were readily handled 
by including alternate syntax forms in the search terms. The 
false positive cases, caused by duplicative and unrelated 
meanings for the terms, were more difficult to manage. Where 
possible, case sensitive suing mapping reduced Inappropriate 
citations. In many cases, however, this was not sufficient and 
the terms had to be eliminated entirely, thereby reducing the 
false positive rale but unavoidably under-representing some 
genes. 

For the purposes of data tracking, a primary gene key was 
selected to represent all synonyms that correspond to each 
gene. Medline records were indexed with a primary gene key 
when any synonym for that key was found in the title or 
abstract Case-insensitive suing mapping was used Tor all 
searches except as noted above. No additional weight was 
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Table 1. Systematic Sources of False Positives and False Negatives in Unffltered Data* 



source of error 



error type 



ipls 



rater solution 



gene symbol/name 
Is not unique 



gene symbol is 

unrelated abbreviation 
gene symbol/name 

has language meaning 
nonstandard syntax 
unofficial gene name/symbol 
nonspedfied gene name 



false positive MAC- myelin 

associated glycoprotein 
A£4C-malignancY-assodated 
protein 

false positive /H-paltid homologue (mouse). 

pallldin (also abbrev. Tor Pennsylvania) 
false positive Wtf-Wlskott-Aldrich Syndrome 

(also the word "was") 
false negative BAG- 1 instead of BAGS 
false negative P53 Instead of TPS 3 
false negative estrogen receptor instead of 

Estrogen receptor l 



eliminate this term 

eliminate this term 

case-sensitive siring search 

add dash term 

add all gene nicknames 

add family stem term 



* In preliminary studies. Medline was scorched for co-occurrence of genes and diseases and the resulting output was evaluated to Identify error sources that 
were amenable to global filters. Each error source Is categorized by the type of error it causes: (also positives arts suggested relationships that are not real and 
false negatives ore real relationships that are undcrre presented. The filter solutions used arc Indicated. Note that In some cases, the filter solution Itself introduces 
error. In general, error rales maximized sensitivity, even at the expense of specificity If needed. 



added for multiple occurrences of a term or the co-occurrence 
of multiple synonyms for the same gene key. 

Medline records were searched with ail qualified gene 
identifiers, such as the official/preferred gene symbol, the 
official/preferred gene name, all gene nicknames and all syntax 
variants. In situations where there are several members of a 
gene family or splice variants, some authors prefer to use a 
shortened gene family name, e.g., estrogen receptor instead of 
estrogen receptor 1 (ESR1). creating a source of false negatives. 
For this reason, gene family stem terms were created for all 
genes that have an alpha or numerical suffix (e.g.. IL2RA, TGFfi, 
ESR1, etc.) and then used to search the literature. The family 
stem terms were handled separately from the specific gene 
names so that it would be clear when linkages were made to 
the gene family versus a specific member in that family. 

To improve performance and accuracy, some pre>selectfan 
was applied to the records that were scanned. First, review 
articles were eliminated to avoid redundant treatment of 
citations. Second, non-English journals were removed because 
the natural language filters were only relevant to English 
publications. Finally, journals unlikely to contain primary data 
about gcne-dlseasc relationships were also removed (eg.. Int. 
J. Health Educ. Bedside Nurse, and / Health Eton). Together, 
these fillers reduced the 12 198 221 Medline publications (July 
2002) by 37%. 

Ranking the Relative Strengths of Gcnc-Discase Associa- 
tions. In total, there were 618 708 gene-disease co-citations, 
in which 16% (8297) of all studied genes had been associated 
to a disease and 96% (3875) of all diseases had been associated 
to at least one gene. To rank the relative strengths of gene 
disease relationships, we tested several different statistical 
methods and examined the results. With the exception of the 
relative risk estimates, the methods provided similar results 
with respect to the rank order of the gene* disease association 
strengths. However, after comparing the results to other 
databases and after consulting disease experts, the tog of the 
product of frequency (LPF) was selected for further analysis 
because it gave the best results overall. 

Validation of MedGenc. In developing this tool, it was 
important to minimize the number of missed genes (false 
negatives) and miscalled genes (false positives). However, in 
situations when these goals were in conflict, indusiveness was 
prioritized. To determine the false negative rale in MedCene, 
breast cancer was used as a test case because it was associated 
with more genes than any other human disease and because 




Figure 1. Estimation of the false negative rate by comparison 
with hand'Curated databases. The breast cancer-related genes 
Identified by MedGene were compared with those listed In 
several other databases including the Tumor Gene Database 
(TGD), 2 the Breast Cancer Gene Daiabase(BCG), 1 GeneCards 
(GC)" and Swissprot. 18 Genes were considered false negatives 
if they were represented in at least one of these other databases 
and not in MedGene and their link to breast cancer was sup- 
ported by at least one literature reference. Ait literature references 
were verified by manual review to confirm their validity. The 
number of genes in each database or shared by more than one 
database is indicated. The false negative rate was calculated by 
genes missed at MedGene (26)/total number of nonoverlapping 
genes in other databases (285). 

there were several public databases that link genes to breast 
cancer. We compared the list of breast cancer-related genes 
from MedGene to these databases, illustrated In Figure 1. 
Among the 285 distinct breast cancer-related genes that were 
supported by at least one literature citation In these hand- 
cu rated databases. 26 were absent from MedCene. suggesting 
a false negative rate of approximately 9%. To determine why 
these were missed, all literature references for these genes (80 
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papers) were reviewed manually (see the Supporting Informa- 
tion, Supplemental Table 2, or visit http://hJpseq.med. 
liarard.edu/MedGene/pubUcation/sJTable 2.html). Among 
these papers, most false negatives were caused by nonstandard 
gene terms or gene terms eliminated by our specificity Alters. 
Few genes were missed because they were only mentioned In 
review papers (0.4%) or they appeared only in the body of the 
manuscript but not the abstract or title (1.1%). Of note, 
MedCene identified approximately 2000 additional breast 
cancer-related genes not listed in any other database. 

To assess the false positive error rale, two complementary 
approaches were used: a detailed analysis of one disease and 
a global examination of 1000 diseases. The detailed approach 
examined the false positive error rate and its sources, whereas 
the global approach tested whether the overall results made 
biomedical sense. 

Using the LPF, 1467 genes related to prostate cancer were 
assembled In rank order. We then retrieved approximately 300 
Medline records each for the highest ranked 100 and the lowest 
ranked 200 genes and manually reviewed the titles and 
abstracts to determine the verity of the association. Nearly 80% 
of the highest ranked 100 genes fell into one of the five 
categories that reflect meaningful gene-disease relationships 
(see the Supporting ^formation, Supplemental Table 3. or visit 
http://hipseq.med.harvard.edu/MedGcne/publication/ 
s_Table 3Jitml). Among the lowest ranked 200 genes, ap- 
proximately 70% reflected true relationships. Of the 600 records 
reviewed, there were only two In which the association between 
the gene and the disease was described as negative. Both were 
genes with very low scores. In both cases, the authors did not 
argue the absence of any relationship, but rather that a 
particular feature of the gene or protein was not shown to be 
related to human prostate cancer. ]X 14 

The coincidence of some gene symbols with medical ab- 
breviations, chemical abbreviations and biological abbrevia- 
tions resulted in most of the false positives (see the Supporting 
Information. Supplemental Table 4, or visit http://hipse. 
q.mcd.harvard.edu/MedCene/publicaUon/s_Table 4.html), em- 
phasizing the importance of the fillers that were added in the 
search algorithm (Table 1). Without the filters, the false positive 
rate more than doubled, and the false negative rate rose 
dramatically (data not shown). For example, among the papers 
about breast cancer, there were only 12 Medline records that 
referred to ESRJ and 10 to ESR2, whereas almost 2000 papers 
mentioned estrogen receptor without specifying ESRl or ESR2; 
this latter group was detected by the family stem term filter. 

To further validate these results, a global analysis of the gene- 
disease relationships described by MedCene was performed. 
For this experiment. It was reasoned that the more closely 
related the diseases are to one another, the more they will be 
related to the same gene sets. Thus, if the relationships defined 
by MedCene accurately reflected the literature, then an unsu- 
pervised hlerarcliical clustering of the gene data should group 
diseases in a manner consistent with common medical think- 
ing. Conversely, If the clustered diseases do not make sense 
biologically or medically, it may reflect excessive false positives, 
false negatives, or inappropriate scoring of the data. 

To execute this experiment, the gene sets and the corre- 
sponding LPF values for 1000 randomly selected diseases (each 
with at least 50 gene relationships) were used as a datascl for 
clustering the diseases. A review of the results showed that the 
resulting disease dusters were Indeed logical based upon 
common medical knowledge (see the Supporting Infonnation. 
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Supplemental Figure 1, or visit httpV/hipseq medJwvard.edu/ 
MedCene/publicaUon/sJ^gure IJhtml). For example, In one 
such cluster shown in Figure 2, diabetes and its complications 
grouped together and were also closely linked to diseases 
associated with starvation states. 

The number of genes associated with a given disease can 
be estimated by adjusting the MedCene number up by the false 
negative rate (*~9%) and down by the false positive rale (~26% 
on average). Using this, the average disease has 103.7 ± 45.3 
(mean ± s.d.) genes associated with It, although the range is 
quite broad with 2359 genes related to breast cancer, 2122 
genes related to lung cancer and no genes related to a number 
of diseases. 

Applying MedGene to the Analysis of Large Datascts. Access 
to a comprehensive summary of the genes linked to human 
diseases provided an opportunity to analyze data obtained from 
a high-throughput experiment We compared the MedGene 
breast cancer gene list to a gene expression data set generated 
from a micro-array analysis comparing breast cancer and 
normal breast tissue samples. Micro-array analysis identified 
2286 genes that had greater than a 1-fold difference in mean 
expression level between breast cancer samples and normal 
breast samples. Using MedCene. we sorted the 2286 genes into 
four classes: 555 genes directly linked to breast cancer in the 
literature by gene term search (first-degree association by gene 
name): 328 genes directly linked by family term search (first* 
degree association by family term); 1021 genes linked to breast 
cancer only through other breast cancer genes (second-degree 
association); and 505 genes not previously associated with 
breast cancer. (See the Supporting Information. Supplemental 
Figure 2, or visit hUp^/hipseq.med.harvard^du/MedCene/ 
publication/sj^gure 2.html.) Among the 505 previously un- 
related genes, 467 were either newly identified genes or genes 
that had not previously been associated with any disease. 
Among the remaining 38 genes. 9 had been related to other 
cancers, specifically esophageal colon, uterine, skin, and cervix. 

To determine whether the genes highlighted by the micro- 
array analysis were more likely to have been previously linked 
to breast cancer in the literature, we created a two-dimensional 
plot of the fold change of expression level between breast 
cancer and normal tissue versus the literature score (LPF) 
(Figure 3A). There was a broad spread of expression changes 
among the genes directly linked to breast cancer ranging from 
less than 1-fold change (68%) to over 40-fold (03%). Notably, 
the majority of genes with greater than 10-fold expression 
changes were linked to breast cancer by first-degree associa- 
tion. 

Among all 754 genes directly linked to breast cancer In the 
literature, there was no correlation between LPF and micro- 
array fold change (r = 0.018. p value » 0.62). However, when 
we stratified the analysis based on the magnitude of the fold 
change, we observed an Increasing trend in correlation (Figure 
3B) suggesting that genes with a more substantial change in 
expression level were more likely to have a stronger association 
in the literature. For genes that had 10-fold change or more In 
expression level, the correlation increased to 0.41 (j>valuc = 
0.05). 

When we evaluated the micro-array data separately for ER 
positive and ER negative tumors, the trend in correlation 
between fold change and literature score was highly dependent 
on estrogen receptor status. Interestingly, there was a similar 
trend in correlation for ER positive tumors, but no trend in 
correlation for ER negative tumors. 
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Coxsackievirus Infection* 
Obesity in Diabetes 
Diabetic Ketoacidosis 
Gluoose Intolerance 
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Pregnancy In Diabetics 
Diabetic Retinopathy 
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Hyperglycemia 
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Critical Illnoss 
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Diabetic Nephropathies 
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Figure 2. Global validation by clustering analysis. 2(A). The gene sets and the corresponding LPF values for 1000 diseases, each wilh 
at least 50 gene relationships, were used in an unsupervised clustering of the diseases based on the gene patterns associated wilh 
them. A sample of the data Is shown here. 2(B). One of the resulting clusters Is shown that corresponds to blood sugar states. Diabetes 
terms (above the line) and starvation states terms (under the line) clustered together. Within these groups, there Is also clustering of 
d»abetic small vessel complications, altered serum chemistries, nutritional disorders, etc.(Supptemental Figure 1: http://riipseq.med. 
harvard.edu/MedGene/publicatlon/s^Figure Lhtml). 

Finally, to validate our findings, we computed similar cor- disease unrelated to breast cancer. As expected, we did not 
relations between the breast cancer expression data and observe an increasing trend in correlation for hypertcn- 
LPF scores generated by McdGene for hypertension, a slon. 
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Figure 3. Relationship between literature score and functional data Tor breast cancer. 3A. The data from an expression analysis of 
samples for breast tumors and normal breast tissue were analyzed to Indicate the fold difference of expression level between breast 
tumor and normal sample (cutoff > 3-fold change). The fold changes were plotted against the literature score for the same gene set. 
Green dots represent first-degree association by gene search, blue dots represent first-degree association by family search and red 
dots represent no-association. Some well-studied genes, such as BRCA2 (pink circle), are not reflected by a substantial difference In 
expression level. Furthermore, the majority of genes that have no association with breast cancer in the literature had less than 10-fold 
expression changes (shaded area). 3B. The Spearman rank -correlation coefficients between literature score (LPF) and the fold change 
of expression level between tumor and normal breast samples (yaxis) In relation to the amount of fold change of expression level 
(x-axis). Gene rank lists were generated for breast cancer (blue) and hypertension (pink). Correlations were also computed between 
the breast cancer gene LPF scores and fold change expression data among estrogen receptor positive tumors only (light blue) and 
estrogen receptor negative tumors only (purple). 
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Table 2. Top 25 Genes Related to Selected Human Diseases* 
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PGR 
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adhesion molecule 
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SIL 
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" Med Gent results for the lop 25 genes Associated with breast neoplasms, hypertension, rheumatoid arthritis, bipolar disorder, and atherosclerosis, respectively, 
ranked by LPF scores. The hyperlink to all the papers co-citing the gene and the disease is available at Med Gene website (http^/hh>*eq.rnediwrvard.cdu/ 

MedGene/) 



Discussion 

The Human Genome Project heralded a new era In biological 
research where the emphasis on understanding specific path- 
ways has expanded to global studies of genomic organization 
and biological systems. High-throughput technologies can 
provide novel insight into comprehensive biological (unction 
but also Introduces new challenges. The utility of these 
technologies Is limited to the ability to generate, analyze, and 
interpret large gene lists. MedCene, a relational database 
derived by mining the information in Medline, was created to 
address this need. MedCene users can query for a rank-ordered 
list of human gene-disease relationships (Table 2) for one or 
more diseases. Each entry is hyperlink cd to the original papers 
supporting each association and to other relevant databases. 

MedCene is an innovative extension of previous text mining 
approaches. Perez-Iratxcta et al. used the CO annotation and 
their chromosomal locations to predict genes that may con- 
tribute to inherited disorders.* MedCene takes a broader view 
and includes all diseases and all possible gene-disease relation- 
ships. Furthermore. MedCene utilizes co-citation to indicate a 
relationship rather than GO annotation, which Is limited to the 
subset of genes that have CO annotation. Our approach is 
complementary to that taken by Chaussabci and Sher. who 
used the frequency of co-cited terms to duster genes into a 
hierarchy of gene-gene relationships. 6 

A unique aspect of this tool is the ability to assess the relative 
strengths of gene-disease relationships based on the frequency 
of both co-citation and single citation. This presupposes that 
most co-citations describe a positive association, often referred 
to as publication bias 15 and is supported by our observations 



that negative associations are rare (Supplemental Table 3: 
http://hipseq.mcd.riarvard.edu/MedGcne/publication/s_Ta- 
ble 3.html). Of course, relationships established by frequency 
of co-dtatlon do not necessarily represent a true biological link; 
however, It is strong evidence to support a true relationship. 

Another Important feature of MedGene is the implementa- 
tion of software filters that substantially reduced the error rate. 
We estimate that less than 10% of all associations were missed 
and at least 70% of even the weakest associations were real. 
For this study, all of the filters that we applied were general 
ones, e.g., expanding the list of all gene names to address the 
different syntax forms used by different journals, eliminating 
gene names tliat correspond to common English words, etc 
The majority of the remaining search term ambiguities were 
idiosyncratic and difficult to identify systematically without 
causing a significant rise in false negatives. Alternative ap- 
proaches, such as the examination of the nearest neighbor 
terms, need to be considered to further reduce the false positive 
rate. 

It Is not uncommon to see expression changes in micro- 
array experiments as small as 2-fold reported in the literature. 
Even when these expression changes are statistically significant. 
It is not always dear if they are biologically meaningful. When 
comparing expression levels of disease to normal tissue, one 
expects an enrichment of known disease-related genes to 
appear in the altered expression group. MedGene provided a 
unique opportunity to test this notion in the context of existing 
knowledge on a novel breast cancer micro-array dataset For 
genes displaying a S-fold change or less in tumors compared 
to normal, there was no evidence of a correlation between 
altered gene expression and a known role in the disease. This 
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Table 3. Genes with Large Expression Changes In ER- but 
Not In ER+ Breast Tumors 



gene symbol 


fold change (ER+) 


fold change (ER 


KRTHB1 


1.0 


610.8 


BRS3 


1.2 


89.4 


DKKI 


1.2 


69.8 


Z1C1 


1.9 


59.6 


TLR1 


1.0 


38.5 
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2.6 


33.2 
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1.0 


30.6 


EB12 
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27.9 
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3.8 


21.9 


STK18 
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18.6 


GPR49 
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14.6 
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1.6 
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13.5 
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4.2 


13.0 
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12.9 
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-1.2 


12.3 
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23 


12.2 
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U.8 


CCNE2 
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11.6 


FGB 


-4.3 


11.1 


KNSL6 


2.9 


10.9 


H1FS 


3.0 


10.2 


SERPINH2 


4.6 


10.2 


YAP1 


1.0 


iao 


LPHB 


-1.3 


-10.4 


TCEA2 


-1.1 


-10.8 


TFF1 


1.3 


-11.4 


COU7AI 


-4.1 


-15.7 


POPS 


I.I 


-16.2 


BPACi 


-4.6 


-223 


PDZKl 


-I.I 


-36.8 


VECFC 


-2.8 


-51.5 


MUC6 


-1,4 


-64.9 


SERPINA5 


-1.0 


-83.1 


MElSl 


-1.6 


-85.9 


CA12 


2.4 


-1503 



Table 3. MedGene Identified a set of relatively understudied, yet highly 
expressed genes In ER negative, but not ER positive breast tumors. All of 
these genes have either never been co-died wlih breast cancer or have a 
weak association except those marked with an \ 



reflects the many genes whose role In breast cancer may not 
involve large changes in expression in sporadic tumors (e.g., 
BRCAl and BRCA2) and genes whose modest changes in 
expression may be unrelated to the disease. Strikingly, among 
genes with a 10- fold change or more In expression level, there 
was a strong and significant correlation between expression 
level and a published role in the disease, providing the first 
global validation of the micro -array approach to identifying 
disease-specific genes. 

The results derived from MedCcne have two implications. 
First, a careful hunt for corroborating evidence of a role in 
breast cancer should precede any further study of genes with 
less than 5-fold expression level changes. Second, any genes 
with 10-fold changes or more arc likely to be related to breast 
cancer and warrant attention. It is likely that this threshold will 
change depending on the disease as well as the experiment 

Interestingly, the observed correlation was only found among 
ER-positive tumors, not ER-negatlve. This may reflect a bias 
in the literature to study the more prevalent type of tumor in 
the population. Furthermore, this emphasizes that caution 
must be taken when interpreting experiments that may contain 
subpopulations that behave very differently. The MedCcne 
approach identified a set of relatively understudied, yet highly 
expressed genes in ER-ncgative tumors thai arc worthy of 
further examination (Table 3). 



In conclusion, we have developed an automated method of 
summarizing and organizing the vast biomedical literature. To 
our knowledge, the resulting database is the most comprehen- 
sive and accurate of its kind. By generating a score that reflects 
the strength of the association, it provides an important toot 
for the rapid and flexible analysis of large datasets from various 
high-throughput screening experiments. Furthermore, it can 
be used for selecting subsets of genes for functional studies, 
for building disease-specific arrays, for looking at genes com- 
mon to multiple diseases and various other high* throughput 
applications. In the future, it will be possible to enhance the 
utility of the MedGene database by building links between 
genes and other MeSH terms as well as other biological 
processes and concepts, such as cell division and responses to 
small molecules. 
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1 Introduction 

A proteome has been defined as the protein complement 
expressed by the genome of an organism, or, in multicel- 
lular organisms, as the protein complement expressed by a 
tissue or differentiated cell In the most common im- 
plementation of proteome analysis the proteins extracted 
from the cell or tissue analyzed are separated by high 

Correspondence: Professor Ruedi Aebersold, Department of Molecular 
Biotechnology. University of Washington, Box 357730, Seattle, WA, 
98195, USA (Tel: +206-685-4235; Fax: +206-685-6392; E-mail: ruedi 
©u.washington. edu) 

Abbreviations: CIO, collision-induced dissociation; MS/MS, tandem 
mass spectromeiry; SAGE, serial analysis of gene expression 

Keywords: Proteome / Two-dimensional polyacrylamide gel electro- 
phoresis / Tandem mass spectrometry 

(0 WILRY-VCH Verlag Gmblt, 69451 Weinheim. 1998 



Proteome analysis: Biological assay or data archive? 

In this review we examine the current state of proteome analysis. There are 
three main issues discussed: why it is necessary to study proteomes; how pro- 
eomeTcan be analyzed with current technology; and how proteome analys. 
can be used to enhance biological research. We conclude that proteome ana - 
ysis is an essential tool in the understanding of regulated biological systems. 
Current technology, while still mostly limited to the more abundant proteins, 
enables the use of proteome analysis both to establish databases of proteins 
present, and to perform biological assays involving measurement ©r multiple 
variables We believe that the utility of proteome analysis in future biologica 
research will continue to be enhanced by further improvements in analytical 
technology. 

resolution two-dimensional gel electrophoresis (2-DE), 
detected in the gel and identified by their amino acid 
sequence. The ease, sensitivity and speed with which gel- 
separated proteins can be identified by the use of recently 
developed mass spectrometry techniques have dramati- 
cally increased the interest in proteome technology. One 
of the most attractive features of such analyses is that com- 
plex biological systems can potentially be studied m their 
entirety, rather than as a multitude of individual compo- 
nents This makes it far easier to uncover the many com- 
plex and often obscure, relationships between mature 
gene products in cells. Large-scale proteome characteriza- 
tion projects have been undertaken for a number ot dif- 
ferent organisms and cell types. Microbial proteome pro- 
jects currently in progress include, for example: Saccharo- 
myces cerevisiae [21, Salmonella enterica [3], Spiroplasma 
melli/erum [4], Mycobacterium tuberculosis [5] Ochrobac- 
trum anthropi 16], Haemophilus influenzae Synecho- 
cystis spp. [8], Escherichia coli [9], Rhizobium legummo- 
sarum [10], and Dictyostelium discoideum [11]. Proteome 
projects underway for tissues of more complex organ- 
isms include those for: human bladder squamous cell 
carcinomas [12], human liver [13], human plasma [13], 
human keratinocytes [12], human fibroblasts [12], mouse 
kidney [12], and rat serum [14]. In this manuscript we cri- 
tically assess the concept of proteome analysis and the 
technical feasibility of establishing complete proteome 
maps, and discuss ways in which proteome analysis and 
biological research intersect. 



2 Rationale for proteome analysis 

The dramatic growth in both the number of genome 
projects and the speed with which genome sequences 
are being determined has generated huge amounts ot 
sequence information, for some species even complete 
genomic sequences ([15-17]). The description of the 
stale of a biological system by the quantitative measure- 
ment of system components has long been a primary 
objective in molecular biology. With recent technical 
advances including the development of differential dis- 
play-PCR [18], cDNA microarray and DNA chip techno- 
logy [19 20] and serial analysis of gene expression 
(SAGE) [21, 22], it is now feasible to establish global and 
quantitative' mRNA expression maps of cells and tissues, 
in which the sequence of all the genes is known, at a 
speed and sensitivity which is not matched by current 
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protein analysis technology. Given the long-standing 
paradigm in biology that DNA synthesizes RNA which 
synthesizes protein, and the ability to rapidly establish 
global, quantitative mRNA expression maps, the ques- 
tions which arise are why technically complex proteome 
projects should be undertaken and what specific types of 
information could be expected from proteome projects 
which cannot be obtained from genomic and transcript 
profiling projects. We see three main reasons for pro- 
teome analysis to become an essential component in the 
comprehensive analysis of biological systems, (i) Protein 
expression levels are not predictable from the mRNA 
expression levels, (ii) proteins are dynamically modified 
and processed in ways which are not necessarily 
apparent from the gene sequence, and (iii) proteomes 
are dynamic and reflect the state of a biological system. 

2.1 Correlation between mRNA and protein expression 
levels 

Interpretations of quantitative mRNA expression profiles 
frequently implicitly or explicitly assume that for specific 
genes the transcript levels are indicative of the levels of 
protein expression. As part of an ongoing study in our 
laboratory, we have determined the correlation of expres- 
sion at the mRNA and protein levels for a population of 
selected genes in the yeast Saccharomyces cerevisiae 
growing at mid-log phase (S. R Gygi et a/., submitted for 
publication). mRNA expression levels were calculated 
from published SAGE frequency tables 122]. Protein 
expression levels were quantified by metabolic radiola- 
beling of the yeast proteins, liquid scintillation counting 
of the protein spots separated by high resolution 2-DE 
and mass spectrometric identification of the protein(s) 
migrating to each spot. The selected 80 samples consti- 
tute a relatively homogeneous group with respect to pre- 
dicted half-life and expression level of the protein pro- 
ducts. Thus far, we have found a general trend but no 
strong correlation between protein and transcript levels 
(Fig. I). For some genes studied equivalent mRNA trans- 
cript levels translated into protein abundances which 
varied by more than 50-fold. Similarly, equivalent steady- 
state protein expression levels were maintained by trans- 
cript levels varying by as much as 40-fold (S. P. Gygi 
et a/., submitted). These results suggests that even for a 
population of genes predicted to be relatively homoge- 
neous with respect to protein half-life and gene expres- 
sion, the protein levels cannot be accurately predicted 
from the level of the corresponding mRNA transcript. 

2.2 Proteins are dynamically modified and processed 

In the mature, biologically active form many proteins are 
post-translationally modified by glycosylation, phosphor- 
ylation, prenylation, acylation, ubiquitination or one or 
more of many other modifications [23] and many pro- 
teins are only functional if specifically associated or com- 
plexed with other molecules, including DNA, RNA, pro- 
teins and organic and inorganic cofactors. Frequently, 
modifications are dynamic and reversible and may alter 
the precise three-dimensional structure and the state of 
activity of a protein. Collectively, the state of modifica- 
tion of the proteins which constitute a biological system 
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Figure /. Correlation between mRNA and protein levels in yeast cells. 
For a selected population of 80 genes, protein levels were measured 
by ,5 -S-radiolabeling and mRNA levels were calculated from publi- 
shed SAGE tables. Inset: expanded view of the low abundance region. 
For more experimental details, also see Figs. 5 and 6, (S. P. Gygi et at., 
submitted). 



are important indicators for the state of the system. The 
type of protein modification and the sites modified at a 
specific cellular state can usually not be determined 
from the gene sequence alone. 

2.3 Proteomes are dynamic and reflect the state of a 
biological system 

A single genome can give rise to many qualitatively and 
quantitatively different proteomes. Specific stages of the 
cell cycle and states of differentiation, responses to 
growth and nutrient conditions, temperature and stress, 
and pathological conditions represent cellular states 
which are characterized by significantly different pro- 
teomes. The proteome, in principle, also reflects events 
that are under translational and post-translational con- 
trol. It is therefore expected that proteomics will be able 
to provide the most precise and detailed molecular des- 
cription of the state of a cell or tissue, provided that the 
external conditions defining the state are carefully deter- 
mined. In answer to the question of whether the study 
of proteomes is necessary for the analysis of biornolec- 
ular systems, it is evident that the analysis of mature pro- 
tein products in cells is essential as there are numerous 
levels of control of protein synthesis, degradation, 
processing and modification, which are only apparent by 
direct protein analysis. 



3 Description and assessment of current proteome 
analysis technology 

3.1 Technical requirements oT proteome technology 

In biological systems the level of expression as well as 
the states of modification, processing and macro-molec- 
ular association of proteins are controlled and modu- 
lated depending on the state of the system. Comprehen- 
sive analysis of the identity, quantity and state of modifi- 
cation of proteins therefore requires the detection and 
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quantitation of the proteins which constitute the system, 
and analysis of differentially processed forms. There are 
a number of inherent difficulties in protein analysis 
which complicate these tasks. First, proteins cannot be 
amplified. It is possible to produce large amounts of a 
particular protein by over-expression in specific cell sys- 
tems. However, since many proteins are dynamically 
post-translationally modified, they cannot be easily am- 
plified in the form in which they finally function in the 
biological system. It is frequently difficult to purify from 
the native source sufficient amounts of a protein for 
analysis. From a technological point of view this trans- 
lates into the need for high sensitivity analytical tech- 
niques. Second, many proteins are modified and pro- 
cessed post-translatipnally. Therefore, in addition to the 
protein identity, the structural basis for differentially 
modified isoforms also needs to be determined. The dis- 
tribution of a constant amount of protein over several 
differentially modified isoforms further reduces the 
amount of each species available for analysis. The com- 
plexity and dynamics of post-translational protein edit- 
ing thus significantly complicates proteome studies. 
Third, proteins vary dramatically with respect to their 
solubility in commonly used solvents. There are few, if 
any, solvent conditions in which all proteins are soluble 
and which are also compatible with protein analysis. This 
makes the development of protein purification methods 
particularly difficult since both protein purification and 
solubility have to be achieved under the same condi- 
tions. Detergents, in particular sodium dodecyl sulfate 
(SDS), are frequently added to aqueous solvents to 
maintain protein solubility. The compatibility with SDS 
is a big advantage of SDS polyacrylamide gel electro- 
phoresis (SDS-PAGE) over other protein separation 
techniques. Thus, SDS-PAGE and two-dimensional gel 
electrophoresis, which also uses SDS and other deter- 
gents, are the most general and preferred methods for 
the purification of small amounts of proteins, provided 
that activity does not necessarily need to be maintained. 
Lastly, the number of proteins in a given cell system is 
typically in the thousands. Any attempt to identify and 
categorize all of these must use methods which are as 
rapid as possible to allow completion of the project 
within a reasonable time frame. Therefore, a successful, 
general proteomics technology requires high sensitivity, 
high throughput, the ability to differentiate differentially 
modified proteins, and the ability to quantitatively dis- 
play and analyze ail the proteins present in a sample. 

3.2 2-D electrophoresis — mass spectrometry: a common 
implementation of proteome analysis 

The most common currently used implementation of 
proteome analysis technology is based on the separation 
of proteins by two-dimensional (IEF/SDS-PAGE) gel 
electrophoresis and their subsequent identification and 
analysis by mass spectrometry (MS) or tandem mass 
spectrometry (MS/MS). In 2-DE, proteins are first separ- 
ated by isoelectric focusing (IEF) and then by SDS- 
PAGE, in the second, perpendicular dimension. Separ- 
ated proteins are visualized at high sensitivity by staining 
or autoradiography, producing two-dimensional arrays of 
proteins. 2-DE gels are, at present, the most commonly 
used means of global display of proteins in complex 
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samples. The separation of thousands of proteins has 
been achieved in a single gel [24, 25] and differentially 
modified proteins are frequently separated. Due to the 
compatibility of 2-DE with high concentrations of deter- 
gents, protein denaturants and other additives promoting 
protein solubility, the technique is widely used. 

The second step of this type of proteome analysis is the 
identification and analysis of separated proteins. Individ- 
ual proteins from polyacrylamide gels have traditionally 
been identified using jV-terminal sequencing [26, 27], 
internal peptide sequencing [28, 29], immunoblotting or 
comigration with known proteins [30], The recent dra- 
matic growth of large-scale genomic and expressed 
sequence tag (EST) sequence databases has resulted in a 
fundamental change in the way proteins are identified by 
their amino acid sequence. Rather than by the traditional 
methods described above, protein sequences are now fre- 
quently determined by correlating mass spectral or 
tandem mass spectral data of peptides derived from pro- 
teins, with the information contained in sequence data- 
bases [31-33]. 

There are a number of alternative approaches to pro- 
teome analysis currently under development. There is 
considerable interest in developing a proteome analysis 
stragegy which bypasses 2-DE altogether, because it is 
considered a relatively slow and tedious process, and 
because of perceived difficulties in extracting proteins 
from the gel matrix for analysis. However, 2-DE as a 
starting point for proteome analysis has many advan- 
tages compared to other techniques available today. The 
most significant strengths of the 2-DE-MS approach 
include the relatively uniform behavior of proteins in 
gels, the ability to quantify spots and the high resolution 
and simultaneous display of hundreds to thousands of 
proteins within a reasonable time frame. 

A schematic diagram of a typical procedure of the identi- 
fication of gel-separated proteins is shown in Fig. 2. Pro- 
tein spots detected in the gel are enzymatically or chemi- 
cally fragmented and the peptide fragments are isolated 
for analysis, as already indicated, most frequently by MS 
or MS/MS. There are numerous protocols for the gener- 
ation of peptide fragments from gel-separated proteins. 
They can be grouped into two categories, digestion in 
the gel slice [28, 34] or digestion after electrotransfer out 
of the gel onto a suitable membrane ([29, 35-37] and 
reviewed in [38]). In most instances either technique is 
applicable and yields good results. The analysis of MS or 
MS/MS data is an important step in the whole process 
because MS instruments can generate an enormous 
amount of information which cannot easily be managed 
manually. Recently, a number of groups have developed 
software systems dedicated to the use of peptide MS 
and MS/MS spectra for the identification of proteins. 
Proteins are identified by correlating the information 
contained in the MS spectra of protein digests or 
MS/MS spectra of individual peptides with data con- 
tained in DNA or protein sequence databases. 

The systems we are currently using in our laboratory are 
based on the separation of the peptides contained in pro- 
tein digests by narrow bore or capillary liquid chromatog- 



V ") Proteome aU^^ 



1998. /V. 1862 — 1871 V } Proteome av Jis: Biological assay or data archive 1865 



separate proteins 



protein 




peptides 



digest 



MS spectrum 
database search 



MS/MS spectrum 
database search 



tell J 


iJJ,ImiI.Hi 




1 ' 1 


li.iiii 


iJltit:,l;llil 



/Vx<ff 2. Schematic diagram of a procedure for identification of gel- 
separated proteins. Peptides can either be separated by a technique 
such as LC or CE, or infused as a mixture and sorted in the MS. Data- 
base searching can either be performed on peptide masses from an 
MS spectrum, peptide fragment masses from CID spectra of peptides, 
or a combination of both. 



raphy [39, 40] or capillary electrophoresis [41], the anal- 
ysis of the separated peptides by electrospray ioniza- 
tion (ESI) MS/MS, and the correlation of the generated 
peptide spectra with sequence databases using the 
SEQUEST program developed at the University of Wash- 
ington [32, 33]. The system automatically performs the 
following operations: a particular peptide ion character- 
ized by its mass-to-charge ratio is selected in the MS out 
of all the peptide ions present in the system at a parti- 
cular time; the selected peptide ion is collided in a colli- 
sion cell with argon (collision-induced dissociation, 
CID) and the masses of the resulting fragment ions are 
determined in the second sector of the tandem MS; this 
experimentally determined CID spectrum is then corre- 
lated with the CID spectra predicted from all the pep- 
tides in a sequence database which have essentially the 
same mass as the peptide selected for CID; this correla- 
tion matches the isolated peptide with a sequence seg- 
ment in a database and thus identifies the protein from 
which the peptide was derived. There are a number of 
alternative programs which use peptide CID spectra for 
protein identification, but we use the SEQUEST system 
because it is currently the most highly automated pro- 
gram and has proven to be successful, versatile and 
robust. 



required. As an approximate guideline, for samples con- 
taining tens of picomoles of peptides, LC-MS/MS is 
most appropriate; for samples containing low picomole 
amounts to high femtomole amounts we use capillary 
LC-MS/MS; and for samples containing femtomoles or 
less, CE-MS/MS is the method of choice. 

3.3.1 LC-MS/MS 

The coupling of an MS to an HPLC system using a 
0.5 mm diameter or bigger reverse phase (RP) column 
has been described in detail [42]. This system has several 
advantages if a large number of samples are to be ana- 
lyzed and all are available in sufficient quantity. The 
LC-MS and database searching program can be run in a 
fully automated mode using an autosampler, thus maxi- 
mizing sample throughput and minimizing the need for 
operator interference. The relatively large column is 
tolerant of high levels of impurities from either gel prep- 
aration or sample matrix. Lastly, if configured with a 
flow-splitter and micro-sprayer [40], analyses can be per- 
formed on a small fraction of the sample (less than 5%) 
while the remainder of the sample is recovered in very 
pure solvents. This latter feature is particularly useful 
when an orthogonal technique is also used to analyze 
peptide fractions, such as scintillation of an introduced 
radiolabel, and this data can be correlated with peptides 
identified by CID spectra. 

3.3.2 Capillary LC-MS 

An increase of sensitivity of approximately tenfold can be 
achieved by using a capillary LC system with a 100 urn ID 
column rather than a 0.5 mm ID column as referred to 
above. Since very low flow rates are required for such 
columns, most reports have used a precolumn flow split- 
ting system for producing solvent gradients. We have 
recently desribed the design and construction of a novel 
gradient mixing system which enables the formation 
of reproducible gradients at very low flow rates (low 
nL/min) without the need for flow splitting (A. Ducret 
el ai, submitted for publication). Using this capillary 
LC-MS/MS system we were able to identify gel-separat- 
ed proteins if low picomole to high femtomole amounts 
were loaded onto the gel [40]. This system is as yet not 
automated and, like ail capillary LC systems, is prone to 
blockage of the columns by microparticulates when ana- 
lyzing gel-separated proteins. 



3.3 Protein identification by LC-MS/MS, capillary 
LC-MS/MS and CE-MS/MS 

It has been demonstrated repeatedly that MS has a very 
high intrinsic sensitivity. For the routine analysis of gel- 
separated proteins at high sensitivity, the most signif- 
icant challenge is the handling of small amounts of 
sample. The crux of the problem is the extraction and 
transferal of peptide mixtures generated by the digestion 
of low nanogram amounts of protein, from gels into the 
MS/MS system without significant loss of sample or 
introduction of unwanted contaminants. We employ 
three different systems for introducing gel-purified sam- 
ples into an MS, depending on the level of sensitivity 



3.3.3 CE-MS/MS 

The highest level of sensitivity for analyzing gel-sep- 
arated proteins can be achieved by using capillary elec- 
trophoresis - mass spectrometry (CE-MS). We have de- 
scribed in the past a solid-phase extraction capillary elec- 
trophoresis (SPE-CE) system which was used with triple 
quadrupole and ion trap ESI-MS/MS systems for the 
identification of proteins at the low femtomole to sub- 
femtomole sensitivity level (43, 44]. While this system is 
highly sensitive, its operation is labor-intensive and its 
operation has not been automated. In order to devise an 
analytical system with both the sensitivity of a CE and 
the level of automation of LC, we have constructed 
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Figure 3. Schematic illustration of a 
microfabricated analytical system for CE, 
consisting of a micromachined device, 
coated capillary electroosmotic pump, 
and microelectrospray interface. The 
dimensions of the channels and reservoir 
are as indicated in the text. The channels 
on the device were graphically enhanced 
to make them more visible. Reproduced 
from [45|, with permission. 



microfabricated devices for the introduction of samples 
into ESI-MS for high-sensitivity peptide analysis. 

The basic device is a piece of glass into which channels 
of 10-30 urn in depth and 50-70 urn in diameter are 
etched by using photolithography/etching techniques 
similar to the ones used in the semiconductor industry. 
(A simple device is shown in Fig. 3). The channels are 
connected to an external high voltage power supply [45]. 
Samples are manipulated on the device and off the 
device to the MS by applying different potentials to the 
reservoirs. This creates a solvent flow by electroosmotic 
pumping which can be redirected by changing the posi- 
tion of the electrode. Therefore, without the need for 
valves or gates and without any external pumping, the 
flow can be redirected by simply switching the position 
of the electrodes on the device. The direction and rate of 
the flow can be modulated by the size and the polarity 
of the electric field applied and also by the charge state 
of the surface. 

The type of data generated by the system is illustrated in 
Fig. 4, which shows the mass spectrum of a peptide sample 
representing the tryptic digest of carbonic anhydrase at 
290 fmol/uL. Each numbered peak indicates a peptide suc- 
cessfully identified as being derived from carbonic an- 



hydrase. Some of the unassigned signals may be chemical 
or peptide contaminants. The MS is programmed to auto- 
matically select each peak and subject the peptide to CID. 
The resulting CID spectra are then used to identify the 
protein by correlation with sequence databases. Therefore, 
this system allows us to concurrently apply a number of 
protein digests onto the device, to sequentially mobilize 
the samples, to automatically generate CID spectra of 
selected peptide ions and to search sequence databases 
for protein identification. These steps are performed auto- 
matically without the need for user input and proteins can 
be identified at very low femtomole level sensitivity at a 
rate of approximately one protein per 15 min. 

3.4 Assessment of 2-DE-MS proteome technology 

Using a combination of the analytical techniques de- 
scribed above we have identified the 80 protein spots 
indicated in Fig. 5. The protein pattern was generated by 
separating a total of 40 microgram of protein contained 
in a total cell lysate of the yeast strain YPH499 by high 
resolution 2-DE and silver staining of the separated pro- 
teins. To estimate how far this type of proteome analysis 
can penetrate towards the identification of low abun- 
dance proteins, we have calculated the codon bias of the 
genes encoding the respective proteins. Codon bias is a 




Figure 4. MS spectrum or a tryptic digest 
of carbonic anhydrase using the microfa- 
bricated system shown in Fig. 3. 290 
fmol/uL of carbonic anhydrase tryptic 
digest was infused into a Finnigan LCQ 
ion trap MS. Each peak was selected for 
CID, and those which were identified as 
containing peptides derived from car- 
bonic anhydrase are numbered. Repro- 
duced from (451, with permission. 
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calculated measure of the degree of redundancy of trip- 
let DNA codons used to produce each amino acid in a 
particular gene sequence. It has been shown to be a 
useful indicator of the level of the protein product of a 
particular gene sequence present in a cell [46]. The gen- 
eral rule which applies is that the higher the value of the 
codon bias calculated for a gene, the more abundant the 
protein product of that gene becomes. The calculated 
codon bias values corresponding to the proteins identi- 
fied in Fig. 5 are shown in Fig. 6b. Nearly all of the pro- 
teins identified (> 95%) have codon bias values of > 0.2, 
indicating they are highly abundant in cells. In contrast, 
codon bias values calculated for the entire yeast genome 
(Fig. 6a) show that the majority of proteins present in 
the proleome have a codon bias of < 0.2 and are thus of 
low abundance. 

This finding is of considerable importance in our assess- 
ment of the current status of proteome analysis technol- 
ogy. It is clear that even using highly sensitive analytical 
techniques, we are only able to visualize and identify the 



more abundant proteins. Since many important regula- 
tory proteins are present only at low abundance, these 
would not be amenable to analysis using such tech- 
niques. This situation would be exacerbated in the anal- 
ysis of proteomes containing many more proteins than 
the approximately 6000 gene products present in yeast 
cells [16]. In the analysis of, for example, the proteome 
of any human cells, there are potentially 50000-100 000 
gene products [47]. Inherent limitations on the amount 
of protein that can be loaded on 2-DE, and the number 
of components that can be resolved, indicate that only 
the most highly abundant fraction of the many gene 
products could be successfully analyzed. One approach 
thai has been employed to circumvent these limitations 
is the use of very narrow range immobilized pH gradient 
strips for the first-dimension separation of 2-DE [48]. 
Since only those proteins which focus within the narrow 
range will enter the second dimension of separation, a 
much higher sample loading within the desired range is 
possible. This, in turn, can lead to the visualization and 
identification of less abundant proteins. 
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Figure 6. Calculated codon bias values for yeast proteins. (A) Distribu- 
tion of calculated values for the entire yeast proteome. (B) Distribu- 
tion of calculated values for the subset of 80 identified proteins also 
shown in Figs. 1 and 5. Further details of experimental procedures are 
included in S. P. Gygi ei al (submitted). 



4 Utility of proteome analysis for biological 
research 

For the success of proteomics as a mainstream approach 
lo the analysis of biological systems it is essential to 
define how proteome analysis and biological research 
projects intersect. Without a clear plan for the implemen- 
tation of proteome-type approaches into biological re- 
search projects the full impact of the technology can not 
be realized. The literature indicates that proteome anal- 
ysis is used both as a database/data archive, and as a bio- 
logical assay or biological research tool. 

4.1 The proteome as a database 

The use of proteomics as a database or data archive 
essentially entails an attempt to identify all the proteins 
in a cell or species and to annotate each protein with the 
known biological information that is relevant for each 
protein. The level of annotation can, of course, be exten- 
sive. The most common implementation of this idea is 
the separation of proteins by high resolution 2-DE, the 
identification of each detected protein spot and the 
annotation of the protein spots in a 2-DE gel database 
format. This approach is complicated by the fact that it is 
difficult to precisely define a proteome and to decide 
which proteome should be represented in the database. 
In contrast to the genome of a species, which is essen- 
tially static, the proteome is highly dynamic. Processes 
such as differentiation, cell activation and disease can all 
significantly change the proteome of a species. This is 
illustrated in Fig. 7. The figure shows two high-resolu- 



tion 2-DE maps of proteins isolated from rat serum. 
Fig. 7A is from the serum of normal rats, while Fig. 7B 
is from the serum of rats in acute-phase serum after 
prior treatment with an inflammation-causing agent [49]. 
It is obvious that the protein patterns are significantly 
different in several areas, raising the question of exactly 
which proteome is being described. 

Therefore, a comprehensive proteome database of a spe- 
cies or cell type needs to contain all of the parameters 
which describe the state and the type of the cells from 
which the proteins were extracted as well as the software 
tools to search the database with queries which reflect 
the dynamics of biological systems. A comprehensive 
proteome database should be capable of quantitatively 
describing the fate of each protein if specific systems 
and pathways are activated in the cell. Specifically, the 
quantity, the degree of modification, the subcellular loca- 
tion and the nature of molecules specifically interacting 
with a protein as well as the rate of change of these 
variables should be described. Using these admittedly 
stringent criteria, there is currently no comlete proteome 
database. A number of such databases are, however, in 
the process of being constructed. The most advanced 
among them, in our opinion, are the yeast protein data- 
base YPD [50] (accessible at http://www.ypd.com) and 
the human 2D-PAGE databases of the Danish Centre 
for Human Genome Research [12] (accessible at http:// 
biobase.dk/cgi-bin/celis). While neither can be con- 
sidered complete as not all of the potential gene pro- 
ducts are identified, both contain extensive annotation 
of supplemental information for many of the spots 
which are positively identified in reference samples. 

4.2 The proteome as a biological assay 

The use of proteome analysis as a biological assay or 
research tool represents an alternative approach to inte- 
grating biology with proteomics. To investigate the state 
of a system, samples are subjected to a specific proceess 
that allows the quantitative or qualitative measurement 
of some of the variables which describe the system. In 
typical biochemical assays one variable (e.g., enzyme 
activity) of a single component (e.g., a particular en- 
zyme) is measured. Using proteomics as an assay, mul- 
tiple variables (e.g., expression level, rate of synthesis, 
phosphorylation state, etc.) are measured concurrently 
on many (ideally all) of the proteins in a sample. The 
use of proteomics as an assay is a less far-reaching prop- 
osition than the construction of a comprehensive pro- 
teome database. It does, however, represent a pragmatic 
approach which can be adapted to investigate specific 
systems and pathways, as long as the interpretation of 
the results takes into account that with current technol- 
ogy not all of the variables which describe the system 
can be observed (see Section 3.4). 

A common implementation of proteome analysis as a 
biological assay is when a 2-DE protein pattern gener- 
ated from the analysis of an experimental sample is 
compared to an array of reference patterns representing 
different states of the system under investigation. The 
state of the experimental system at the time the sample 
was generated is therefore determined by the quantita- 
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live comparative analysis of hundreds to a few thousand 
proteins. Comparative analysis of the 2-DE patterns fur- 
ihermore highlights quantitative and qualitative differ- 
ences in the protein profiles which correlate with the 
state of the system. For this type of analysis it is not 
essential that all the proteins are identified or even visu- 



alized, although the results become more informative as 
more proteins are compared. It is obvious, however, that 
the possibility to identify any protein deemed character- 
istic for a particular state dramatically enhances this 
approach by opening up new avenues for experimenta- 
tion. 




Figure 7. High resolution 2-DE map of proteins isolated from ral serum with or without prior exposure to an inflam- 
mation-causing agent. (A) normal rat serum, (B) acute-phase serum from rats which had previously been exposed to 
an inflammation-causing agent. The first dimension of separation is an IPG from pit 4-10, and the second dimen- 
sion is a 7.5-1 7. 5%T gradient SDS-PAGE gel. Proteins were visualized by staining with amido black. Further details 
of experimental procedures are included in |I4, 49). 




Proteome analysis as a biological assay has been success- 
fully used in the field of toxicology, to characterize 
disease stales or to study differential activation of cells. 
The approach is limited, of course, by the fact that only 
the visible protein spots are included in the assay, and it 
is well known that a substantial but far from complete 
fraction of cellular proteins are detected if a total cell 
lysate is separated by 2-DE. Proteins may not be 
detected in 2-DE gels because they are not abundant 
enough to be visualized by the detection method used, 
because they do not migrate within the boundaries (size, 
p/) resolved by the gel, because they are not soluble 
under the conditions used, or for other reasons. 

A different way to use proteome analysis as a biological 
assay to define the state of a biological system is to take 
advantage of the wealth of information contained in 
2-DE protein patterns. 2-DE is referred to as two-dimen- 
sional because of the electrophoretic mobility and the 
isoelectric points which define the position of each pro- 
tein in a 2-DE pattern. In addition to the two dimen- 
sions used to generate the protein patterns, a number of 
additional data dimensions are contained in the protein 
patterns. Some of these dimensions such as protein 
expression level, phosphorylation state, subcellular loca- 
tion, association with other proteins, rate of synthesis or 
degradation indicate the activity state of a protein or a 
biological system. Comparative analysis of 2-DE protein 
patterns representing different states is therefore ideally 
suited for the detection, identification and analysis of 
suitable markers. Once again it must be emphasized that 
in this type of experiment only a fraction of the cellular 
proteins is analyzed. Since many regulatory proteins are 
of low abundance, this limitation is a concern, particu- 
larly in cases in which regulatory pathways are being 
investigated. 

5 Concluding remarks 

In this report we have addressed three main issues 
related to proteome analysis. First, we have discussed 
the rationale for studying proteomes. Second, we have 
assessed the technical feasibility of analyzing proteomes 
and described current proteome technology, and third, 
we have analyzed the utility of proteome analysis for bio- 
logical research. It is apparent that proteome analysis is 
an essential tool in the analysis of biological systems. 
The multi-level control of protein synthesis and degrada- 
tion in cells means that only the direct analysis of 
mature protein products can reveal their correct identi- 
ties, their relevant state of modification and/or associa- 
tion and their amounts. Recently developed methods 
have enabled the identification of proteins at ever- 
increasing sensitivity levels and at a high level of auto- 
mation of the analytical processes. A number of tech- 
nical challenges, however, remain. While it is currently 
possible to identify essentially any protein spots that can 
be visualized by common staining methods, it is ap- 
parent that without prior enrichment only a relatively 
small and highly selected population of long-lived, 
highly expressed proteins is observed. There are many 
more proteins in a given cell which are not visualized by 
such methods. Frequently it is the low abundance pro- 
teins that execute key regulatory functions. 
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We have outlined the two principal ways proteome anal- 
ysis is currently being used to intersect with biological 
research projects: the proteome as a database or data 
archive and proteome analysis as a biological assay. Both 
approaches have in common that at present they are con- 
ceptually and technically limited. Current proteome data- 
bases typically are limited to one cell type and one state 
of a cell and therefore do not account for the dynamics 
of biological systems. The use of proteome analysis as a 
biological assay can provide a wealth of information, but 
it is limited to the proteins detected and is therefore not 
truly proteome-wide. These limitations in proteomics are 
to a large extent a reflection of the fact that proteins in 
their fully processed form cannot easily be amplified and 
are therefore difficult to isolate in amounts sufficient for 
analysis or experimentation. The fact that to date no 
complete proteome has been described further attests to 
these difficulties. With continued rapid progress in pro- 
tein analysis technology, however, we anticipate that the 
goal of complete proteome analysis will eventually 
become attainable. 
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Discordant Protein and mRNA Expression in 
Lung Adenocarcinomas* 

Guoan Chen*, Tarek G. Gharibt, Chiang-Ching Huang§, Jeremy M. G. Taylor§, 
David E. Mlsekl), Sharon L R. Kardla||, Thomas J* Giordano**, Mark D. lannettoni*. 
Mark B. Orringer*, Samfr M. Hanash% and David G. Beer* t* 



The relationship between gene expression measured at 
the mRNA level and the corresponding protein level Is not 
well characterized In human cancer. In this study, we 
compared mRNA and protein expression for a cohort of 
genes In the same lung adenocarcinomas. The abun- 
dance of 16S protein spots representing 98 Individual 
genes was analyzed In 76 lung adenocarcinomas and nine 
nonneoplastic lung tissues using two-dimensional poly- 
acryt amide gel electrophoresis. Specific polypeptides 
were Identified using matrix-assisted laser desorptlort/ 
Ionization mass spectrometry. For the same 8S samples, 
mRNA levels were determined using oligonucleotide ml- 
croarrays, allowing a comparative analysis of mRNA and 
protein expression among the 165 protein spots. Twenty- 
eight of the 165 protein spots (17%) or 21 of 98 genes 
(21.4%) had a statistically significant correlation between 
protein and mRNA expression (r > 0,2445; p < 0.057; 
however, among all 165 proteins the correlation coeffi- 
cient values (fl ranged from -0.467 to 0.442. Correlation 
coefficient values were not related to protein abundance. 
Further, no significant correlation between mRNA and 
protein expression was found (r = -0,025) If the average 
tevels of mRNA or protein among all samples were applied 
across the 165 protein spots (98 genes). Hie mRNA/ 
protein correlation coefficient also varied among pro- 
teins with multiple (soforms, Indicating potentially sep- 
arate isof orm-spechlc mechanisms for the regulation of 
protein abundance. Among the 21 genes with a signifi- 
cant correlation between mRNA and protein, five genes 
differed significantly between stage i and stage III lung 
adenocarcinomas. Using a quantitative analysis of mRNA 
and protein expression within the same lung adenocarci- 
nomas, we showed that only a subset of the proteins 
exhibited a significant correlation with mRNA abundance. 
Molecular & Cellular Proteomlcs 1:304-313, 2002. 



Lung cancer is the leading cause of cancer death for both 
men and women In the United States. Adenocarcinomas of 
the lung comprise -40% of att new cases of non-small cell 
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lung cancer and are now the most common histologic type. 
Functional genomics, broadly defined as the comprehensive 
analysis of genes and their products, have become a recent 
focus of the life sciences (1). Application of these approaches to 
lung adenocarcinomas has the potential to aid In the Identifica- 
tion of high risk patients with resectable early stage Jung cancer 
that may benefit from adjuvant therapy, as well as to Identify 
new therapeutic targets. In human lung cancer, however, little Is 
currently understood regarding the relationship between gene 
expression as determined by measuring mRNA levels and the 
corresponding abundance of the protein products. 

A number of powerful techniques for analysis of gene ex- 
pression have been used including differential display (2), 
serial anajysls of gene expression (3), DNA mlcroarrays (4), 
and proteomlcs via two-dimensional polyacrylamlde gel elec- 
trophoresis and mass spectrometry (5). Blolnfbrmatlcs tools 
have also been developed to help determine quantitative 
mRNA/proteln expression profiles of ai types of cells and 
tissues (6) and now can be applied to benign and malignant 
tumors. DNA mlcroarrays (cDNA and oligonucleotide) permit 
the parallel assessment of thousands of genes and have been 
utilized In gene expression monitoring (7), polymorphism anal- 
ysis (8), and DNA sequencing (9). Recent studies have fo- 
cused on classification or Identification of subgroups of lung 
tumors using DNA mlcroarrays. (10, 11). The use of mRNA 
expression patterns by themselves, however, Is insufficient for 
understanding the expression of protein products, as addi- 
tional post-transcrtptlona) mechanisms, including protein 
translation, post-translatlonal modification, and degradation, 
may influence the level of a protein present in a given cell or 
tlssua Proteomlc analyses, a complementary technology to 
DNA mteroarrays for monitoring gene expression, Involves 
protein separation and quantitative assessment of protein 
spots using 2D 1 -PAGE and protein Identification using mass 
spectrometry. By combining proteomlc and transcriptional 
analyses of the same samples, however, it may be possible to 
understand the complex mechanisms Influencing protein ex- 
pression fri human cancer. 

In this study, we determined mRNA and protein levels for 
165 proteins {98 genes) In 76 lung adenocarcinomas and nine 



^The abbreviations used are: 2D, two-dimensional; MALDI-MS 
matrix-assisted laser desoiptlon/lonlzatlon mass spectrometry. 
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Protein and rnRNA Correlation in Lung Adenocarcinomas 



Table I 

CemVatfon coefficients of protein and rnRNA where only one spot was present on 2D gels 
i 4 , correlation coefficient value > 0,2445; p < 0.05. Values In boldface are significant at p < 0.05. 



spot 


Unloena 


Aiina noma 
Utlllo i nil lit) 


r 


1104 


Hs.184510 


SFN 


04337 


0994 


H8.77840 


ANXA4 


04219 


1314 


Hs>10958 


DJ-1 


04982 


1454 


Hs.78428 


SOD1 


0,3863 


1638 


H8.227751 


LGALS1 


0.3318 


0264 


Hs.129548 


HNRPK 


0.3034 


1405 


Hs.1 11334 


FTL* 


0.2849 


0963 


Hs.300711 


ANXA5 


0.2488 


1252 


Hs,4745 


PSMC 


0.2445 


0906 


Hs.234489 


LDHB 


0.4420 


1171 


H8.241515 


COX11 


0*2310 


'1160 


Hs.181013 


PGAM1 


0.2023 


0759 


Hs.74635 


DLD 


04985 


1193 


H8.833&3 


AOE372 




0172 


H3.3G69 


HSPA98 


ft 1A79 


0777 


Hs.979 


PDHB 


n iav; 


1249 


Ha.226795 


6STP1 


n 

v. 1 1 fO 


1665 


Ha.76136 


TXN 


ft 17W 


1205 


Hs.82314 


HPRT1 




1230 


H8.279860 


TPT1 


0*1456 


0603 


Hs.1 81357 


LAMB1 


0.1453 


1358 


He.28914 


APRT 


0.1399 


1410 


H8.82113 


DUT 


ft 191^ 
v. \C IO 


1826 


Hs.1 12378 


UM31 


ft 191*4 


0871 


Hs.250502 


CAS 


01199 
*** 1 14^C 


0289 


. H8.82916 


COT6A 




11.43 


Hs/11465 


GSTTLp28 


0.0997 


1456 


Hs.1 18638 


NME1 


0.0932 


1698 


Hs,278503 


RIQ ' ' 


0.0905 


1354 


Hs.89781 


ATP5D 


0.0904 


1445 


Hs.155485 


HIP2 


0.0843 


1479 


He.177486 


APP 


0.0746 


0608 


H8.182265 


KRT19 


0.0439 


1071 


Ha.10842 


RAN 


0.0277 


0991 


He.297939 


CTSB 


0,0254 


0842 


Hs.77274 


PLAU 


0.0248 


0823 
0613 


Hs.198248 


B4QALT1 


0.0183 


H8.1247 


APOA4 


0.0176 


1338 


Hs.104143 


CLTA 


0.0123 


0802 


Hs.5123 


$ID6-308 


0.0117 


1688 


Ha.1473 


GRP 


-0.0040 


0265 
1414 


Hs.274402 


HSPA1B 


-0.0071 


Ha.77641 


ARF5 


-0.0098 


0710 


Hs,97208 


HIP1 


-0.0114 


0632 


Ha.170328 


MSN 


-0.0132 


0525 


Ha.284255 


ALPP 


-0,0148 


0513 


H3.76901 


PDIR 


-0.0289 


16S9 


Hs.256697 


HINTT 


-0.0312 


1262 


Hs.7016 


RAB7 


-0.0362 


0190 


Hs.184411 


ALB 


-0.0470 


0948 


Hs.2795 


UDHA 


-0.0549 


0502 


Hs.180532 


GPI 


-0.0575 


0152 


Hs.75410 


HSPA5 


-0.0640 


1054 


He.74276 


CUC1 


-0.0686 


0709 


Hs.253495 


SFTPD 


-0.0936 


0867 


H 8.78996 


PCNA 


-0.0982 


0165 


H8.180414 


HSPA8 


-0.1014 


1109 


Hs.75103 


YWHAZ 


-0.1018 


0137 


Hs,554 


SSA2 


-0.1032 



Protein name 



14-3-3 <r 

Annexlft IV 

DJ-1 proteto/MEKS 

Superoxide dlsmutase (Cu-Zn) 

Galeotln 1 

Transformation up-regulated nuclear protein 
Ferritin light chain 
Anhexln V 

26 8 proteasome p28 

Wactate dehydrogenase H chain (LDH-B) 

COX 11 

Pnosptogryoerate mutase 
DIhydrollpoarrilde dehydrogenase precursor' 
Antioxidant enzyme AOE372 
GRP75 

Pymvate dehydrogenase E1-0 aubunH precursor 
Glutathione transferase pi (GST-pl) 
Thloredoxfn 

HQ phosphorfoosyrtransl erase 
Translationafly controlled tumor protein (TCTP) 
lamk 

Adenine phosphoriboayj transferase 
dUTP pyrophosphatase (dUTPase) 
Pinch-2 protein 

Carbonic anhydrase-related protein; Syntaxtn 
ChaperonlrHOce protein 

Glutathfone ^-transferase homolog (GST homolog) 
Nm23 (NDPKA) 
ftHG (U32331) 

FFO-type ATP synthase subunlt d 
Huntlngth Interacting protein 2 (HIP2) 
AmylokJB4A 
Cytokeratln 19 

GTP-blndlng nuclear protein RANfTC4) 
Cathepsin B 

Urokinase plasminogen activator 

£ 1 ,4-galactosyl transferase • 

Apcltpoproteln A4 (ApoA4) 

Ctathrin Dght chain A 

CytoaoBc Inorganic pyrophosphatase 

Prepro$astrln*feteaslng peptide 

Heat shocMnduced protein 

ADP-ribosytatton factor 1 

Huntlngrtln Interacting protein 1 (HIPl) 

Moesln/E 

Alkaline phosphate, placental 

Protein disulfide Isomerase-related protein 6 

Protein Wnase C Inhibitor 

Rab 7 protein 

Albumin 

Lactate dehydrogenase-A (LOHA) 

Hsp89 

GRP78 

Nuclear chloride channel (RNCC protein) 
Pulmonary surfactant protein D 
PCNA 

Heat shock cognate protein, 71 kDa 
14-3-3 t/t 
Ro/ss-A antigen 
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Table 1— continued 


Sbot 


Unlgene 


Gene name 


f 


Protein name 


0278 
1769 
0039 
2511 
1?39 
1138 
2533 


Hs.4ll2 

Hs.9614 

Hs.74335 

Hs.163179 

Hs,16488 

Hs.301901 

Hs.77060 


TCP1 

NPM1 

HSPOB 

FABP5 

CALR 

QSTM4 

psmb6 


-0.1237 
-0.1738 
-0.2049 
-0.2109 
-0.2344 
-0,2438 
-0.2512 


T-comp!ex protein I, « subuntt 

B23mumatrln 

Hsp90 

E-FABP/FABP5 
CaJreUcufln 32 

Glutathione S-transferaae M4 (GST m4) 
Macropain subunlt A 



nonneoplastic Jung tissues. Protein levels were determined 
using quantitative 2D-PAGE analysis, and the separated pro- 
tein polypeptides were Identified using matrix-assisted laser 
desorptJon/Ionteatlon mass spectrometry (MALDI-MS). The 
corresponding mRNA levels for the Identified proteins within 
the same samples were determined using oligonucleotide 
mlcroarrays. Correlation analyses showed that protein abun- 
dance Is likely a reflection of the transcription for a subset of 
proteins, but translation and post-tnmslatfonal modhlcatlons 
also appear to influence the expression levels of many Indi- 
vidual proteins In lung adenocarcinomas. 

EXPERIMENTAL PROCEDURES 

77ssus3-flfty-sevBn ^ e 1 *** 19 stage III lung adendcarclno- 
mas, as weft as nine non-neoplastlo lung tissue samples, were used 
for protein and mRNA analyses. Patient consent was obtained, and 
■the project was approved by the Institutional Review Board All tis- 
sues were obtained after resection at the IMvemfty of Michigan 
Health System between May 1991 and July 1998. Tissues were ad 
snap-frozen In liquid nitrogen and then stored at -80 °C. The patients 
included 46 females and 30 males ranging in age torn 40.9 to 84 6 
(average $3.8) years. Most patients (66/76) demonstrated a positive 
smoking history. Sixty-one tumor samples .were classified as bron- 
chial-derived, 14 were classified as bronchoalveolar, and one had 
both features. Eighteen tumor samples were classified as wefl differ- 
8 Were dassmed 03 nxKlerate, and 19 were classified as 
poorly differentiated adenocarcinomas. Hematoxyttn-stalned cryostat 
sections (5 ftm), prepared from the same tumor pieces to be utilized 
for protein and mRNA Isolation, were evaluated by a pathologist and 
compared with hematoxylin, and eosln-stalned sections made from 
paraffin blocks of the same tumors. Specimens were excluded from 
analysis if they showed unclear or mixed histology (e.g. adenosoua- 
mous), tumor ceBuiarrty less than 70%, potential metastatic origin as 
Indicated by pmvious tumor history, extensive lymphocytic Infiltration 
or fibrosis or If the patient had received prior chemotherapv or 
radiotherapy, 1 

Oligonucleotide Array Hybridization -The HuQeneFL oligonucleo- 
tide arrays (Affymetnx, Santa Clam, containing 6800 genes were 
used In this study. Total RNA was Isolated from all samples using 
THzoI reagent (IrivHrogen). The resulting RNA was then subjected to 
further purification using RNeasy spin columns (QIagen). Preparation 
of cRNA, hybridisation, and scanning of the HuQeneFL arrays were 
performed according to the manufacturer's protocol (Affymetrlx 
Santa Clara, GA). Data analysis was performed using GeneChlp 4.0 
software, The gene expression profile of each tumor was normalized 
to the median gene expression profile for the entire sample. Details of 
data trimming and normalization are described elsewhere (1 1) 

2D-PAQE and Quantitative Protein Analysla-Vssue for both pro- 
tein and mRNA Isolation came from contiguous areas of each sample 
Protein separation using 2D-PAGE, silver staining, and digitization 



were performed as described previously (12, 13). Our2D-PAGE sys- 
tem allows us to run 20 gels at one time (one batch). Spot detection 
and qtiantlfioation were accomplished utilizing Bio Image Visage Sys- 
tem software (Blolmage Corp., Ann Arbor, Ml). The integrated Inten- 
sity of each spot was calculated as the measured optical density 
unite x mm 2 . Of me total possible 2000 spots detectable on each gel, 
820 spots on the gel of each sample were matched using a Gefed 
match program with the same spots on a chosen "master- get In 
each sample, 250 ubiquitously expressed reference spots were used 
to adjust for variations between gels, such as that created by subtle 
differences In protein loading or gel staining. Slight differences be- 
cause of batch were corrected after spot-size quantification. 

Mass Spectrometry and 2D Western Blotting- Preparative 2D pels 
were run using extracts from A549 rung adenooarclnoma cells (ob- 
talnedfrom ATCQ and using the Identical experimental conditions as 
the analytical 2D gels, except 30% more protein was loaded. The 
resolved protein gels were sHver-stalned using successive. Incuba- 
tions in 0.02% sodium tWosulftrfe for 2 mln, 0.1% silver nitrate for 40 
m n, and 0.014% formaldehyde plus 2% sodium carbonate for 10 
miri^proteln Identification, protelnpotypeptides underwent trypsin 
digestion followed by MALDJ-MS using a MALDI-TOF Voyaged 
mass spectrometer (Perseptive Blosystems, Framlngharn, MA). The 
HJf^o ™ 6 ? 5nnpared wltn k 10 ™ <WSto digest databases using 

nL^L^TJ^^?* of CaHforn ^ 
^^'^^^^^^^ Some of the polypep- 
tides Included In the analysis had been Identified prior to this sfaoV on 
the basis of sequencing (14). The Identified protein spots used in this 
paper are shown In Rg. 1A The method for 20-PAGE Western blot 
rafflcatfon > was as described previous Jy (1 5). The 2D Western blots of 
GRP68 and Op18 are shown In Rg. t , C and £; the others such as 
. jjjm GRP75 HSP70. HSC70, KRT8. ^^Zt^ t 
ApoJ. 14^3, Annexin 1, Annexln II, PGP9.6, DJ-1, GST-pl, and 
PGAM, are described elsewhere, 2 

J**t^ > ^ & ~^ fe5,ng wero ****** with the mean 
value of the protein spot The transform x - tog (1 +x) was applied 

^rf? J!f Bln ^ e95 i Qn Vahte3 ' ™* relationship between 
prote n and mRNA expression levels within the same samples was 
examined using the Spearman correlation ooeffldert analysfe (1Q; To 
Identify potentially significant correlations between gene and protein 
expression, we used an analytical strategy similar to SAM felcnlfl- 
cance analysis of mloroarrays) (17), which uses a permutation tech- 
nique to determine the significance of changes In gene expression 
between different biotogloal states. To obtain permuted correlation 
coefficients between gene and protein expression, genes were ex- 
changed first In such a way that permutated correlation coefficient 
tosed on pseud0 P** <* flenes and proteins. The 
dtetrlbutlon of permutated correlation coefficients became stable after 
60 permutations. This procedure was then repeated 60 times to 
obtain 60 sets of permutated correlation coefficients. For each of the 
60 permutations, the correlations of genes and proteins were ranked 



2 Chen ef a/., submitted for publication. 
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r*. correlation 



spot 



Table II 

Correlation coefficients of protein and mRNA where multiple Isotdrms were present on 2D gels 
coefficient value > 0,2445; p < 0.05. Values In boldfaoe are secant at p < 0.05 



Unlgene 



Gene name 



1484 
0957 
0353 
0855 
1198 
1203 
0523 
1432 
1493 
1181 
0439 
0505 
0593 
1874 
0935 
2524 
2324 
1182 
0350 
0992 
08S1 
0853 
2603 
0381 
0371 
1179 
0762 
0760 
2508 
0772 
0723 



1237 
1234 



0427 
0424 
0883 
0780 
1527 
1484 
1728 
1712 
0947 
1232 
122Q 
1595 
1810 
1450 
1458 
0619 
0615 
1250 
0549 
0338 
0333 
0331 
2381 
0635 



Hs.81916 

He.77899 

Hs.289101 

Hs.1 69476 

H8.41707 

Hs.83848 

Hs.65114 

Hs,81916 

Hs.81915 

Hs.78225 

Hs.242463 

HSJ297763 

Hs.297753 

Hs.75313 

Hs.75544 

Ha.78225 

Hs.65114 

Hs,41707 

H&289101 

Hs.75313 

He.75313 

Hs.75313 

Hs.76392 

Hs.76392 

Hs.76392 

Hs.78225 

Hs.78225 

Hs.78225 

He.217493 

Hs.217493. 

Hs.217493 

HS.93194 

Hs.93194 

Hs.93194 

Hs.25 

Hs.26 

Hs.25 

HeJ5106 

Hs.75106 

Hs.119140 

Hs.119140 

HS.S241 

He.5241 

Hs.169476 

Hs.76207 

Hs.75207 

Hs.158300 

He.75980 

He.75990 

Hs.75990 

Hs.75990 

He.75990 

H*,41707 

Hs.79037 

Hs.79037 

Hs.79037 

Hs.79037 

Hs.65114 

Hs.65114 



LAP 18 

TPM1 

GKP58 

GAPD 

HSPB3 

7PI1 

KRT18 

LAP18 

LAP18 

ANXA1 

KRT8 

VIM 

VIM 

AKR1B1 

YWHAH 

ANXA1 

KRT18 

HSPB3 

GRP68 

AKR1B1 

AKR1B1 

AKR1B1 

ALDH1 

ALDH1 

ALOH1 

ANXA1 

ANXA1 

ANXA1 

ANXA2 

ANXA2 

ANXA2 

APOA1 

APOA1 

APOA1 

ATPSB 

ATP5B 

ATP5B 

CLU 

CLU 

EF5A 

E1F5A 

FABP1 

FABP1 

QAP0 

GU01 

GL01 

HAP1 

HP 

HP 

HP 

HP 

HP 

HSPB3 
HSPD1 
HSPD1 
HSP01 
HSPD1 
KRT18 
KRT18 



Protein name 



0.4003 
0.3930 
0.3802 
0.3893 



0.3335 
0.3234 
0.3164 
0.3102 
0.3049 
0.2039 
6.2809 
0.2790 
0.2775 
0.2612 
0.2601 
02558 
0.2516 
-0.2460 
0.0761 
-0.0675 
-0.0565 
-0.0371 
-0.0880 
0.2062 
-0.0739 
-0,0228 
0.2223 
0.2080 
0.0701 
0.1133 
-0.0373 
-0.0894 
0.0080 
0.0122 
-0.0992 
-0.0483 
-0.0443 
-0.0726 
-0.0376 
-0/1916 
-0.0473 
0.1745 
0.2249 
0.0450 
-0.0137 
-0.4672 
0.0802 
-0.0305 
0.0461 
-0.0034 
-0.1024 
0.1074 
0.2265 
0.1383 
0.1603 
0.2016 
0.1106 



OP18(Stathmln) 
Tropomyosins 1-6 

Protease disulfide Isomerase (GRP58) 
Glyceraldehyde-3-phosphate dehydrogenase 
Hsp27 

Trlose phosphate Isomerase (TPl) 
Cytokeratln 18 
OP18(Stathmfo) 
OPl8(Stathmln) 
Annexln variant I 
Cytokeratln 8 



VFrnsntln 
Aldose reductase 
14-3-3 i, 
Annexln J 
Cytokeratln 18 
Hsp27 

PhosphoH>ase C (GRP68) 
Aldose reductase 
Aldose reductase 
Aldose reductase 
Aldehyde dehydrogenase 
Aldehyde dehydrogenase 
Aldehyde dehydrogenase 
Annexln variant I 
Annexln I 
Annexln I 

Mpocotfn (annexln II) 
Upocotln (annexln II) 
Upocotln 

Apollpoprotein A1 (ApoA1) 

AfcoUpoprotefo A1 (ApoA1) 

Apoflpoprotefn A1 (ApoA1) 

ATP synthase p subunlt precursor 

ATP synthase p subunlt precursor 

ATP synthase fi subunlt precursor 

Apollpoprotein J (ApoJ) 

Apollpoprotein J (ApoJ) 

elF-5A 

elF-6A 

L-FABP 

L-FABP 

Orycera!dehyde-3-phosphate dehydrogenase 
Gfyoxalase-l 

Gtyoxalase-1 

Hunting-associated protein 1 (neuroan 1) 

a-Haptoglobh 

a-Haptoglobin 

^-Haptoglobin 

B-haptogtobln 

B-haptoglobin 

Hsp27 

Hsp60 

Hspeo 
Hspeo 

Hsp60 

Cytokeratln 18 
Cytokeratln 18 
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Table II— continued. 

Correlation coefficients of protein and mRNA where multiple tsoforms were pnssent on 2D gels 
coefficient value > 0.2445; p < 0.05. Values In boldface are significant at p < 0X5. 



opoi 


1 UImaha 

umgane 


Gene name 


r 4 


0529 


Ha.85114 


KRT18 




0528 


H$.65114 


KRT18 


00414 


0527 


H8.65114 


KRT18 




0514 


Hs.65114 


KRT18 




0451 


H$ 242463 


K"RTfl 


U»Ul 1 I 


0446 


Hs.242463 


fxn i o 


U.Uo47 


0444 


Hs.242463 




-0.131 1 


0443 


Ha 242463 


i\rt jo 


0,0942 

A AiAc 

0,0495 


1488 


Ha 61915 

1 19. V IV (W 


LMTI5 


0321 


Ha 76655 




-0.0546 


0320 


Hfi 75655 


DilUD 

r4Hb 


-0.0041 


1063 




DUD 


0,0441 


0837 


Ur 75393 


DUD 


0,1402 


0326 




ScnrlNAl 


-0.0227 


0322 


Ha 90 7 Aft 1 


ocHrlNAl 


-0.0277 


0241 




ocRPINAi 


-0.0148 


1280 


ne.ou 1*104 


SFTPAt 


-0.1488 


1278 




SFTPA1 


-0.2040 


0866 
0778 


ns./oyt>u 


TNMT1 


0.1162 




TNNTI 


0.0740 


1213 




TPI1 


0.0024 




HS.P3848 


TPI1 


0.0490 


1207 


Li « 

rl9»Ch90*tCf 


TPli 


-0.1616 


1204 


Hs.83848 


TPI1 




1202 


H8.83848 


TPI1 


0.0721 


1161 


HS.83848 


TPI1 


0.2265 


1052 


Ha.77899 


TPM1 


-0.1040 


1039 


tfs.77899 


TPM1 


-0.2999 


1035 


H6.77899 


TPM1 


-0.3821 


0783 


H8.77899 


TPM1 


0.0757 


1574 


Hs.194366 


TTR • 


-0.0085 


0809 


Hs.194366 


TTR 


0.0399 


2202 


Hs.76118 


UCHL1 


-0.0220 


1246 


Hs.76118 


UCHL1 


-0.1261 


1242 


Hs.76118 


UCHL1 


0,1473 


0606 


Hs.297753 


VIM 


0.0951 


0594 


H8.2977S3 


VIM 


-0.2664 


0508 


Hs.297763 


VIM 


0.1008 


0419 


Hs.207753 


VIM 


0.0032 


1279 


Hs.75544 


YWHAH 


0.0059 



Protein name 



. Cytokeratto 18 
CytokeratJn 16 
Cytokeratin 18 
Cytokerattn 18 
Cytokeratin 8 
Cytokerath 8 
Cytokeratin 8 
Cytokeratfei 8 
0P18$tammln) 
PDI(projy-4-OH-6) 
PDI(proV^OH-B) 
Prohfbltln 
Prohfbltln 
<H-Antltrtpeln 
orT-AntfWjpsIn 
<M-Antftn>sln 

Pulmonary surfactant-associated protefn 
Pulmonary surfactant-associated protein 
Troponin T 
Troponin T 

Trios* phosphate Jsomerase (TPI) 
Trtose phosphate ^omerase (TPI) 
Trtose phosphate Isomerase (TPO 
Trlose phosphate Isomer**© (TPI) 
Trtose phosphate .^omerase (TPI) 
"Woe© phosphate isomer** (TPQ 
TropomysJn ctearvDroduct 
Cytoskeleta} tropomyosin 
Tropomyosin 
Tropomyosins 1-5 
Transthyretin 
Tnanstrryretin multifnere . 

Ublquffln carbcxyl-termlnal hydrolase Isozyme L1 
UblquWn carbcoo/-terrnlnal hydrolase Isozyme L1 
UblquWn carbcxyi -terminal hydrolase Isozyme 11 
Vlmemln 

VTmermn-dertved protein (vld4) 
VTmentin-derived protein (vld2) 
Vlmentfn-dertvecl protein (vldl) 
14-3-3 1, 



such that pj) denotes the Ah largest correlation coefficient for pth 
permutation- Hence, the expected correlation coefficient, p^, was the 
average over the 60 permutations, ftJHSJJLi ppd/60. A scatter plot of 
observed correlations WO) versus the expected correlations Is shown In 
Rg. 20. For this study, we chose threshold A » 0.1 1 6 so that correlation 
would be considered dgnffleanttf absolute value of difference between 
p» and pgfl) was greater than the threshold. Twenty-nine (Including one 
. with observed correlation coefficient -0.4672) of 1 65 pairs of gene and 
protein expression were called significant in such criteria, and the 
permuted data generated an average of 5.1 falsely significant pairs of 
gene and protein expression. This provided an estimated false dis- 
covery rate (the percentage of pairs of gene and protein expression 
Identified by chance) for our data set. 

RESULTS 

Correlation of Individual Proteins and mRNA Expression 
within Each rumor- We" have examined quantitatively 165 



protein spots on 20 geJs representing 98 genes and com- 
pared protein levels wfth mRMA levels for a cohort of 85 lung 
adenocarcimmas andnorrnaj lung samples. Of the 165 pro- 
tein spots. 69 proteins were represented by only one known 
spot on 2D gets for an individual gene, whereas 96 protein 
spots showed multlpja protein products from 29 different 
genes. 2D Western blotting verified the proteins Identified by 
mass spectrometry when specific antibodies were available. 
Spearman correlation coefficients of the proteins and their 
associated mRNA for each protein spot were generated using 
all 76 lung adenocarcinomas and .nine non-neoplastic lung 
tissues (see Tables I and I!, and see flgs. 1 and 2). The 
correlation coefficients (r) ranged from -0.467 to 0.442 (Rg. 
20). A total of 28 protein spots (21 genes) were found to have 
a statistically significant correlation between expression of 
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Kia 1. A, digital ftnage of a sHver-atalned 2D-PAGE separation of a stage I lung adenocarcinoma ehowfog protein soots seDaratad bv 
^^«^ d ^ric point in. Twenty-eight protein apoteXse egression levete am coSSMm ^adSSot 
"f* ~ **• 5^ f^** protein GRP50. a 2D Western blot of onS En X ^SmTiZ 

adenooarclnoma cell line, D, the outlined areas oM showing the protein Isoforms of Op18. E, 20 Western blot of Op 16 [from SihSh^ 



their protein and mRNA (r > 0.2445; p < 0.05). This accounts 
for 17% (28/165) of the 165 protein spots. Among the 69 
genes for which only a single protein spot was known (Table 
Or nine genes {9/69, 13%) were observed to show a statisti- 
cally significant relationship between protein and mRNA 
abundance (r > 0.244S; p < 0.05). The proteins whose ex- . 
presslon levels were correlated with their mRNA abundance 
Included those Involved In signal transduction, carbohydrate 
metabolism, apoptosls, protein posMranslatlonal modifica- 
tion, structural proteins, and heat shock proteins (Table III), 

Individual fsofbrms of the Same Protein Have Different 
Proteln/mRNA Correlation Coefficients— Of the 165 protein 
spots, 96 represent protein products of 29 genes with at least 
two Isoforrns. Among these 96 protein spots, 19 (19796 pro- 
tein spots, 20%) showed a statistically significant correlation 
between their protein and mRNA expression If > 0.2445; p < 
0.05) (Table ll) and represented 12 genes (12/29, 41%), Individ- 
ual Isoforrns of the same protein demonstrated different 
proteirvmRNA correlation coefficients. For example, 2D-PAGE/ 
Western analysis revealed four Isoforms of OP18 differing h 
regards to Isoelectric point but similar in molecular weight. 
Three of the four Isoforrns (spots 1492 t 1493, and 1494) showed 
a statistically significant correlation between their protein and 
mRNA abundance (r - 0.3234, 0.3154, and 0.4003, respective- 
ly). The forth teoform (spot 1468) showed no correlation be- 



tween protein and mRNA expression (r = 0.0495). SlrnlJarty, Just 
one of five quantified isoforms of cytokeratln 8 (spot 439) dem- 
onstrated a statistically significant correlation between protein 
and mRNA abundance jjr » 0.3049; p < 0.05) (Table II). 

In addition to differences In the relationship between mRNA 
levels and protein expression among separate isoforms, some 
genes with very comparable mRNA levels showed a 24-fold 
difference In their protein expression. Genes with comparable 
protein expression levels also showed up to a 28-fold vari- 
ance In their. mRNA levels. 

Lack of Correlation for mRNA and Protein Expression when 
Using Average Tumor Values across All 165 Protein Spots (98 
Genes;-The relationship between mRNA and protein expres- 
sion was also examined by using the average expression 
values for all samples. To analyze this relationship using this 
approach, the average value for each protein or mRNA was 
generated using all 85 lung tissue samples. The range of ' 
normalized average protein values ranged from -0.0646 to 
0.0979 (raw value 0.0036 to 4.1947), and the range for mRNA 
was from 0 to 15260.5 for all 165 individual protein spots. The 
Spearman correlation coefficient for the whole data set (165 
protein spots/98 genes) was -0.025 (Fig. 3A). Even for the 28 
protein spots (Fig, 20) that were found to have a statistically 
significant correlation between their mRNA and protein, use of 



Molecular & Cellular Proteomhs 1.4 309 



Protein and mRNA Correlation in Lung Adenocarcinomas 



•a 

«* 
\ 

.la, 
4*1 



-3S#- 



.«.■ v * :, ; " 



[Aft 

4* 



.two' 




verification analysis using SAM. A more detailed des^Xtf mVmethl^ o^Mon ooefflotente « and 

of the 165 proteins demonstrate a significant corrSte^nSS » ^^ P ^ UrBS " Approximately 17% 



the average value resulted in a correction coeffldent value of 
-0.035, which was not significant (Fig. 3B). 

lac* 0/ a Relationship between Proteln/mRNA Correlation 
Coefficients and Average Protein Abundance-To determine* 
whether an absolute protein level might Influence the corre- 
lation with mRNA, the. mean value of each protein (relative 
abundance) and the Spearman proteliYmRNA correlation co- 
efficients among alt 85 samples were examined. No relation- 
ship between the protein abundance and the coirelatfon co- 
efficients was observed (r = 0.039; p > 0.05). A detailed 
analysis of separate subsets of proteins with differing levels of 
abundance (less than -0.0014, largerthan -0.0014, or larger 
than 0.0077) also showed a lack of correlation between mRNA 
and protein expression among the 83 (50%), 82 (50%), and 41 
(25%) of 165 total protein spots, respectively (r «= 0.016, 0.08, 
and 0.1 72, respectively). 

Stage-related Changes In the Proteln/mRNA Correlation 
Coefflclents^To determine whether the 21 genes (28 protein 
spots) showing a significant correlation between the protein 
and mRNA expression among all samples demonstrate 
changes in this relationship during tumor progression, the 
correlations were examined separately for stage I (n ■ 57) and 



stage III (n = 19) lung adenocarcinomas (Table III). The num- 
ber of non-neopiastlc lung samples (n = S) was Insufficient for 
a separate correlation analysis of this group. Many of the * 
protein spots represent one of several known protein tsoforms 
for a given gene- The majority of genes (10/21) did not differ in 
the pmteln/mRNA correlation between stage I and stage Hi 
tumors Indicating a similar regulatory relationship between the 
mRNA and protein spot GRP-58, PSMC, SOW, TPI1, and 
VIM, however, were found to demonstrate significant differ- 
ences In the correlation coefficients between stage I and 
stage III lung adenocarcinomas. For GRP-58, PSMC, and VIM 
the change In the correlation coeffldent was because of a 
relative focrease In protein expression In stage III tumors. For 
SOD and TPI the change resulted from a relative decrease h 
expression of this specific protein In stage III tumore. 

DISCUSSION 

Relatively little is known about the regulatory mechanisms 
controlling the complex patterns of protein abundance and 
post-translatfonal modification in tumors. Most reports con- 
cerning the regulation of protein translation have focused on 
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r, correlation coefficient. Values 



Spot 



Table ill 

Stega-dependant analysts of protetn-mRNA correlation coefficients 
in boldface Indicate a significant difference between stage I and stage Ml. 



Gene name r(StageQ r(Stagelll) 



Function 



1874 
£524 
0994 
0963 
. 1314 
1405 
0865 
0350 
0264 
1192 
0523 
0430 
1492 
1638 
1252 
1104 
1454 
1203 
0957 
0593 
0935 



AKR1&1 

ANXA1 

ANXA4 

ANXA6 

DJ-1 

FTL 

QAPD 

QRP58 

HNRPK 

HSPB3 

KRT18 

KRT8 

LAP18 

LQALS1 

PSMO 

SFN 

SOD1 

TPI1 

TPM1 

VIM 

YWHAH 



0.269 
0.184 
0.660 
0,241 
0.363 
0.126 
0.243 
0.327 
0.360 
0.457 
0.116 
0.323 
0.483 
0.200 
0.253 
0.465 
0.352 
0.378 
0.47S 
-0,054 
0.283 



0,106 
0.572 
0.362 
0.390 
0.354 
0.358 
0.681 
-0.087 
0.243 
0.633 
0.371 
0.436 
0.663 
0.628 

aoeo 

0.475 
0.079 
O.009 
0.225 
0^56 
0.210 



Carbohydrate metabolism; electron transporter 
Phoaphollpase inhibitor; signal transduction 
Phosphollpase Inhibitor 

Phoaphollpase Inhibitor; calcium binding; phospholipid binding 
Signal transduction 
Iron storage protein 

Carbohydrate metabolism (glycolysis regulation) 
Signal transduction; protein disulfide isomerase 
RNA^bfndtng protein (RNA processing/modification) 
Heat shock protein 
Structural protein 
Structural protein 

Signal transduction; cell growth and maintenance 
Apoptosls; cell adhesion; cell size control 
Protein degradation 

Signal transduction {protein kinase C Inhibitor) 

Oxldoreductase 

Carbohydrate metabolism 

Structural protein (musde); control of heart 

Structural protein 

Signal transduction 



one or several protein products (16). Cells et al. (19) found a 
good correlation between transcript and protein levels among 
40 well resolved, abundant proteins using a proteomfc and 
mlcroarray study of bladder cancer. By comparing the mRNA 
and protein expression levels within the same tumor samples, 
we found that 17% (287165) of the protein spots (21/98 genes) 
show a statistically significant correlation between mRNA and 
protein. These proteins appear to represent a diverse group of 
gene products and Include those Involved In signal transduc- 
tion, carbohydrate metabolism, protein modification, cefl struc- 
ture, heat shock, and apoptosls. These results suggest mat 
expression of this subset of 1 85 proteins Is likely to be regulated 
at the transcriptional level In these tissues. The majorfty of the 
protein isoforms, however, cfld not correlate with mRNA levels, 
and thus their expression Is regulated by other mechanisms. We 
also observed a subset of proteins that demonstrated a nega- 
tive correlation with the mRNA expression values; for example 
^-haptoglobin demonstrated a strong negative correlation with 
its mRNA expression values. This may reflect negative feedback 
on the mRNA or the protein or the presence of other regulatory 
Influences that are not understood currently. 

Post-translatlohal modification or processing will result In 
Individual protein products of the same gene migrating to 
different locations on 2D-PAQE gels (20). Because the Identity 
of all possible Isoforms for each protein examined has not 
been characterized completely, this may Influence the corre- 
lation analyses performed in this study. This Is partly because 
of limitations of the 2D-PAGE and mass spectrometry tech- 
nologies (21, 22). Potential Inconsistencies between mRNA 
and protein correlations that have been reported may also be 
because of differences, even In the same gene, In the mech- 



anisms of protein translation among, different ceils or as 
measured In different laboratories (23). 

In this study, we examined 165 protein spots Identified In 
lung adenocarcinomas. Ninety-six protein spots, representing 
the products of 29 genes, contained at least two protein 
isoforms. Nineteen of 96 protein spots, representing 12 
genes, were shown to have a statistically significant correla- 
tion between their protein and mRNA expression, suggesting 
that the levels of these proteins reflects the transcription of the 
corresponding genes. Differences In protein/mRNA correlations 
were found among the individual Isoforms of aghren protein. For 
example, of the four OP1 8 Isoforms, three stowed a statistically 
significant correlation between the protein and mRNA expres- 
sion levels. The lack of relationship for the one Isoform, how- 
ever, indicates that Individual protein Isoforms of the same gene 
product can be regulated differentially. This is not unexpected 
and likely reflects other post-transtatlonaJ mechanisms that can 
Influence Isoform abundance In tissues and cancer. 

In addition to the analyses of the correlation of mRNA/ 
protein within the same tumor samples, we also tested the 
global relationship between mRNA and the corresponding 
protein abundance across all 165 protein spots in the lung 
samples. A protein and mRNA average value for each gene 
was generated using all 85 lung tissues samples. We ob- 
served a very wide range of normalized average protein and 
mRNA values. The correlatloh coefficient generated using this 
average value data set was -0.025, and even for the 28 
protein spots that showed a statistically significant correlation 
between Individual mRNA and proteins, the correlation value 
was only -0.035. This suggests that it Is not possible to 
predict overall protein expression levels based on average 
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The overall correlation of 
mRNA and protein levels across all 
165 protein spots (4) and across 28 
protein spots that contained Individ- 
ual r values larger than 0,244 (8) are 
shown. Each protein or mRNA mean 
value was calculated based on a& 76 
lung adenocarcinomas and nine non- 
neoplastic lung samples using quantita- 
tive 2D-PAQE and Asymetrix oDgonu- 
oleotlde mtetoarraya. The Spearman 
correlation coefficients for the two data 
sets and BO were ~a025 and -0.035, 
respectively. Indicating a lack of correla- 
tion If mean values for mRNA and protein 
for all samples Is used. 



— n I | 



i % 1 r u > 

fifcop fartto xmo tim ijstm. 



0;1 




41,88 







* 




* 




•* 





mRNA abundance In lung cancer samples. ThJs conclusion is 
also supported by previous results from Anderson and Sell- 
harner (24), who examined 19 genes In human liver cells, and 
by Gygl et el. (25). who examined 106 genes In yeast Both 
studies found a lack of correlation between mRNA and protein 
expression when average or overall levels were. used. 

.A good correlation was reported whan the i 11 most abun- 
dant proteins were examined In yeast (25), suggesting that the 
level of protein abundance may be a factor that may Influence 
the correlation between mRNA and protein. In the present 
study, a fairly wide range of mean protein values among 165 
protein spots in lung adenocarcinomas was observed, and 
the correlation coefficients also vailed from -0.467 to 0.442. 



•ia$w worm imff faoob' 
fflRNA m*M vah#; 

A comparison between the mean value of each protein and 
the correlation coefficient generated using all 85 tissue sam- 
ples did not reveal a strong relationshtp between the overall 
protein abundance and the correlation coefficients if = 0.039; 
p > 0.05). Detailed analysis of different subsets of protein abun- 
dance also felted to show a correlation between mRNA and 
protein expression. Thus In contrast to yeast, a relationship 
between mRNA^roteln correlation coefficient and protein 
abundance In human lung adenocarcinomas was not observed 
The results of this study Indicate that the level of protein 
abundance in lung adenocarclnornas Is associated with the 
corresponding levels of mRNA in 17% (28 proteins) of the 
total 165 protein spots examined. This was substantially 
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higher than the amount predicted to result by chance alone 
(which was 5.1) and suggests that a transcriptional mecha- 
nism Okely underlies the abundance of these proteins In lung 
adenocarcinomas. We also demonstrate that the expression 
of Individual Isoforms of the same protein may or may not 
correlate with the mRNA, Indicating that separate and likely 
posMranslatlonal mechanisms account for the regulation of 
Isoform abundance. These mechanisms may also accourt for 
the differences In the correlation coefficients observed between 
stage I and stage III tumom, Indicating that specific protein 
teoforms show regulatory changes during tumor progression. 
Further studies In lung adenocarcinomas will examine the rela- 
tionship between the expression of Individual protein Isoforms 
and specific cBnlcsJ-pathologlcal features of these tumors, such 
as the presence of anglolymphatlc Invasion, and nodal or pleu- 
ral surface Involvement The potential to Identify specific protein 
Isoforms associated with biological behavior In lung adenocar- 
cinomas would be of considerable Interest and will add to our 
understanding of the regulation of gene products by transcrip- 
tional, translations!, and post-translatlonal mechanisms. 
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We have determined the relationship between mRNA and protein expression levels for selected genes 
expressed in the yeast Saccharomyces cerevisiae growing at mid-log phase. The proteins contained in total yeast 
cell lysate were separated by high-resolution two-dimensional (2D) gel electrophoresis. Over 150 protein spots 
were excised and identified by capillary liquid chroma tography-tandem mass spectrometry (LC-MS/MS). 
Protein spots were quantified by metabolic labeling and scintillation counting. Corresponding mRNA levels 
were calculated from serial analysis of gene expression (SAGE) frequency tables (V. E. Velculescu, L. Zhang, 
W. Zhou, J. Vogelstein, M. A. Basrai, D. E. Bassett, Jr., P. Hieter, B. Vogelstein, and K. W. Kinzler, Cell 
88:243-251, 1997). We found that the correlation between mRNA and protein levels was insufficient to predict 
protein expression levels from quantitative mRNA data. Indeed, for some genes, while the mRNA levels were 
of the same value the protein levels varied by more than 20-fold. Conversely, invariant steady-state levels of 
certain proteins were observed with respective mRNA transcript levels that varied by as much as 30-fold. 
Another interesting observation is that codon bias is not a predictor of either protein or mRNA levels. Our 
results clearly delineate the technical boundaries of current approaches for quantitative analysis of protein 
expression and reveal that simple deduction from mRNA transcript analysis is insufficient 



The description of the state of a biological system by the 
quantitative measurement of the system constituents is an es- 
sential but largely unexplored area of biology. With recent 
technical advances including the development of differential 
display-PCR (21), of cDNA microarray and DNA chip tech- 
nology (20, 27), and of serial analysis of gene expression 
(SAGE) (34, 35), it is now feasible to establish global and 
quantitative mRNA expression profiles of cells and tissues in 
species for which the sequence of all the genes is known. 
However, there is emerging evidence which suggests that 
mRNA expression patterns are necessary but are by them- 
selves insufficient for the quantitative description of biological 
systems. This evidence includes discoveries of posttranscrip- 
tional mechanisms controlling the protein translation rate (15), 
the half-lives of specific proteins or mRNAs (33), and the 
intracellular location and molecular association of the protein 
products of expressed genes (32). 

Proteome analysis, defined as the analysis of the protein 
complement expressed by a genome (26), has been suggested 
as an approach to the quantitative description of the state of a 
biological system by the quantitative analysis of protein expres- 
sion profiles (36). Proteome analysis is conceptually attractive 
because of its potential to determine properties of biological 
systems that are not apparent by DNA or mRNA sequence 
analysis alone. Such properties include the quantity of protein 
expression, the subcellular location, the state of modification, 
and the association with ligands, as well as the rate of change 
with time of such properties. In contrast to the genomes of a 
number of microorganisms (for a review, see reference 11) and 
the transcriptome of Saccharomyces cerevisiae (35), which have 
been entirely determined, no proteome map has been com- 
pleted to date. 

The most common implementation of proteome analysis is 
the combination of two-dimensional gel electrophoresis (2DE) 
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(isoelectric focusing-sodium dodecyl sulfate [SDS]-poryacryl- 
amide gel electrophoresis) for the separation and quantitation 
of proteins with analytical methods for their identification. 
2DE permits the separation, visualization, and quantitation of 
thousands of proteins reproducibly on a single gel (18, 24). By 
itself, 2DE is strictly a descriptive technique. The combination 
of 2DE with protein analytical techniques has added the pos- 
sibility of establishing the identities of separated proteins (1,2) 
and thus, in combination with quantitative mRNA analysis, of 
correlating quantitative protein and mRNA expression mea- 
surements of selected genes. 

The recent introduction of mass spectrometric protein anal- 
ysis techniques has dramatically enhanced the throughput and 
sensitivity of protein identification to a level which now permits 
the large-scale analysis of proteins separated by 2DE. The 
techniques have reached a level of sensitivity that permits the 
identification of essentially any protein that is detectable in the 
gels by conventional protein staining (9, 29). Current protein 
analytical technology is based on the mass spectrometric gen- 
eration of peptide fragment patterns that are idiotypic for the 
sequence of a protein. Protein identity is established by corre- 
lating such fragment patterns with sequence databases (10, 22, 
37). Sophisticated computer software (8) has automated the 
entire process such that proteins are routinely identified with 
no human interpretation of peptide fragment patterns. 

In this study, we have analyzed the mRNA and protein levels 
of a group of genes expressed in exponentially growing cells of 
the yeast S. cerevisiae. Protein expression levels were quantified 
by metabolic labeling of the yeast proteins to a steady state, 
followed by 2DE and liquid scintillation counting of the se- 
lected, separated protein species. Separated proteins were 
identified by in-gel tryptic digestion of spots with subsequent 
analysis by microspray liquid chromatography-tandem mass 
spectrometry (LC-MS/MS) and sequence database searching. 
The corresponding mRNA transcript levels were calculated 
from SAGE frequency tables (35). 

This study, for the first time, explores a quantitative com- 
parison of mRNA transcript and protein expression levels for 
a relatively large number of genes expressed in the same met- 
abolic state. The resultant correlation is insufficient for predic- 
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FIG. 1. Schematic illustration of proteome analysis by 2DE and mass spectrometry. In part I, proteins are separated by 2DE, stained spots are excised and subjected 
to m-gel digestion with trypsin, and the resulting peptides are separated by on-line capillary high-performance liquid chromatography. In part II, a peptide is shown 
eluting from the column in part I. The peptide is ionized by electrospray ionization and enters the mass spectrometer. The mass of the ionized peptide is detected and 
the first quadrupole mass filter aUows only the specific mass-to-charge ratio of the selected peptide ion to pass into the collision cell. In the collision cell the enerrized 
lomzed peptides collide with neutral argon gas molecules. Fragmentation of the peptide is essentially random but occurs mainly at the peptide bonds resulting in smaller 
peptides of differing lengths (masses). These peptide fragments are detected as a tandem mass (MS/MS) spectrum in the third quadrupole mass filter where two ion 
series are recorded simultaneously, one each from sequencing inward from the N and C termini of the peptide, respectively. In part III, the MS/MS spectrum from the 
selected, ionized peptide is compared to predicted tandem mass spectra computer generated from a sequence database. Provided that the peptide sequence exists in 
the database, the peptide and, by association, the protein from which the peptide was derived can be identified. Unambiguous protein identification is attained in a sinele 
analysis because multiple peptides are identified as being derived from the same protein. 



tion of protein levels from mRNA transcript levels. We have 
also compared the relative amounts of protein and mRNA 
with the respective codon bias values for the corresponding 
genes. This comparison indicates that codon bias by itself is 
insufficient to accurately predict either the mRNA or the pro- 
tein expression levels of a gene. In addition, the results dem- 
onstrate that only highly expressed proteins are detectable by 
2DE separation of total cell lysates and that therefore the 
construction of complete proteome maps with current technol- 
ogy will be very challenging, irrespective of the type of organ- 
ism. 

MATERIALS AND METHODS 

Yeast strain and growth conditions. The source of protein and message tran- 
scripts for all experiments was YPH499 (MATa ura3-52 Iys2-801 ade2-101 
Ieu2-M his3-&200 trpl-A63) (30). Logarithmically growing cells were obtained by 
growing yeast cells to early log phase (3 x 10 6 cells/ml) in YPD rich medium 
(YPD supplemented with 6 mM uracil, 4.8 mM adenine, and 24 mM tryptophan) 
at 30°C (35). Metabolic labeling of protein was accomplished in YPD medium 



exactly as described elsewhere (4) with the exception that 1 ml of cells was 
labeled with 3 mCi to offset methionine present in YPD medium. Protein was 
harvested as described by Garrels and coworkers (12). Harvested protein was 
lyophilized, resuspended in isoelectric focusing gel rehydration solution, and 
stored at -80°C. 

2DE. Soluble proteins were run in the first dimension by using a commercial 
flatbed electrophoresis system (Multiphor II; Pharmacia Biotech). Immobilized 
polyacrylamide gel (IPG) dry strips with nonlinear pH 3.0 to 10.0 gradients 
(Amersham-Pharmacia Biotech) were used for the first-dimension separation. 
Forty micrograms of protein from whole-cell lysates was mixed with IPG strip 
rehydration buffer (8 M urea, 2% Nonidet P-40, 10 mM dithiothreitol), and 250 
to 380 u.1 of solution was added to individual lanes of an IPG strip rehydration 
tray (Amersham-Pharmacia Biotech). The strips were allowed to rehydrate at 
room temperature for 1 h. The samples were run at 300 V-10 mA-5 W for 2 h, 
then ramped to 3,500 V-10 mA-5 W over a period of 3 h, and then kept at 3,500 
V-10 mA-5 W for 15 to 19 h. At the end of the first-dimension run (60 to 70 kV • 
h), the IPG strips were reequih'brated for 8 min in 2% (wt/vol) dithiothreitol in 
2% (wt/vol) SDS-6 M urea-30% (wt/vol) glycerol-0.05 M Tris HC1 (pH 6.8) and 
for 4 min in 2.5% iodoacetamide in 2% (wt/vol) SDS-6 M urea-30% (wt/vol) 
glycerol-0.05 M Tris HC1 (pH 6.8). Following reequilibration, the strips were 
transferred and apposed to 10% polyacrylamide second-dimension gels. Poly- 
acrylamide gels were poured in a casting stand with 10% acrylamide-2.67% 
piperazine diacrylamide-0375 M Tris base-HQ (pH 8.8)-0.1% (wt/vol) SDS-0.05% 
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>u h i silver-stained gel of the proteins in yeast total cell rysate. Proteins were separated in the first dimension (horizontal) by isoelectric focusing and then in 
the second dimension (vertical) by molecular weight sieving. Protein spots (156) were chosen to include the entire range of molecular weights, isoelectric foaising points 
and staining intensities. Spots were excised, and the corresponding protein was identified by mass spectrometry and database searching. The spots are labeled on the 
gel and correspond to the data presented in Table 1. Molecular weights are given in thousands. r 



(wt/vol) ammonium persulfate-0.05% TEMED (^^'^V'-tetramethylethyl- 
enediamine) in Milli-Q water. The apparatus used to run second-dimension gels 
was a noncommercial apparatus from Oxford Glycosciences, Inc. Once the IPG 
strips were apposed to the second-dimension gels, they were immediately run at 
50 mA (constant)-500 V-^85 W for 20 min, followed by 200 mA (constant)-500 
V-85 W until the buffer front line was 10 to 15 mm from the bottom of the gel. 
Gels were removed and silver stained according to the procedure of Shevchenko 
et al. (29). 

Protein identification. Gels were exposed to X-ray film overnight, and then the 
silver staining and film were used to excise 156 spots of varying intensities, 
molecular weights, and isoelectric focusing points. In order to increase the 
detection limit by mass spectrometry, spots were cut out and pooled from up to 
four identical cold, silver-stained gels. In-gel tryptic digests of pooled spots were 
performed as described previously (29). Tryptic peptides were analyzed by mi- 
crocapillary LC-MS with automated switching to MS/MS mode for peptide 
fragmentation. Spectra were searched against the composite OWL protein se- 
quence database (version 30.2; 250,514 protein sequences) (24a) by using the 
computer program Sequest (8), which matches theoretical and acquired tandem 
mass spectra. A protein match was determined by comparing the number of 
peptides identified and their respective cross-correlation scores. All protein 
identifications were verified by comparison with theoretical molecular weights 
and isoelectric points. 



mRNA quantitation. Velculescu and coworkers have previously generated 
frequency tables for yeast mRNA transcripts from the same strain grown under 
the same stated conditions as described herein (35). The SAGE technology is 
based on two main principles. First, a short sequence tag (15 bp) that contains 
sufficient information uniquely to identify a transcript is generated. A single tag 
is usually generated from each mRNA transcript in the cell which corresponds to 
15 bp at the 3 '-most cutting site for Main. Second, many transcript tags can be 
concatenated into a single molecule and then sequenced, revealing the identity of 
multiple tags simultaneously. Over 20,000 transcripts were sequenced from yeast 
strain YPH499 growing at mid-log phase on glucose. Assuming the previously 
derived estimate of 15,000 mRNA molecules per cell (16), this would represent 
a 1.3-fold coverage even for mRNA molecules present at a single copy per cell 
and would provide a 72% probability of detecting such transcripts. Computer 
software which took for input the gene detected, examined the nucleotide se- 
quence, and performed the calculation as described by Velculescu and coworkers 
(35) was written. In practice, we found that for 21 of 128 (16%) genes examined 
viable mRNA levels from SAGE data could not be calculated. This was because 
(i) no CATG site was found in the open reading frame (ORF), (ii) a CATG site 
was found but the corresponding 10-bp putative SAGE tag was not found in the 
frequency tables, or (in) identical putative SAGE tags were present for multiple 
genes (e.g., TDH2_ YEAST and TDH3_ YEAST). 
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TABLE 1. Expressed genes identified from 2D gel in Fig. 2 TABLE 1 — Continued 



Mol wt 


Pi 


Spot no. 


YPF» oene 
name" 


Protein 

aUunUdllCC 

(10* copies/ 
cell) 


mRNA 
abundance 
(copies/cell) 


Codon 


17,259 


6.75 


133 


CPR1 


15.2 


61.7 


0.769 


18,702 


4.80 


83 


EGD2 


20.1 


5.2 


0.724 


18,726 


4.44 


147 


YKL056C 


61.2 


88.4 


0.831 


18,978 


5.95 


135 


YER067W 


3.7 


6.7 


0.118 


19,108 


5.04 


130 


YLR109W 


94.4 


9.7 


0.680 


19,681 


9.08 


136 


ATP7 


11.0 


NA fc,c 


0.246 


20,505 


6.07 


111 


GUK1 


16.5 


3.7 


0.422 


21,444 


5.25 


148 


SARI 


5.4 


10.4 


0.455 


21,583 


4.98 


95 


TSA1 


110.6 


40.1 


0.845 


22,602 


4.30 


80 


EFB1 


66.1 


23.8 


0.875 


23,079 


6.29 


112 


SOD2 


12.6 


2.2 


0351 


23,743 


5.44 


137 


HSP26 


NA d 


0.7 


0.434 


24,033 


5.97 


96 


ADK1 


17.4 


16.4 


0.656 


24,058 


4.43 


143 


YKL117W 


29.2 


10.4 


0.339 


24,353 


6.30 


140 


TFS1 


8.1 


0.7 


0.146 


24,662 


5.85 


99 


URA5 


25.4 


6.0 


0.359 


24,808 


6.33 


97 


GSP1 


26.3 


5.2 


0.735 


24,908 


8.73 


122 


RPS5 


18.6 


NA C 


0.899 


25,081 


4.65 


81 


MRP8 


9.3 


NA C 


0.241 


25,960 


6.06 


116 


RPE1 


5.8 


0.7 


0.372 


26,378 


9.55 


127 


RPS3 


96.8 


NA C 


0.863 


26,467 


5.18 


100 


VMA4 


10.5 


3.7 


0.427 


26,661 


5.84 


98 


TPI1 


NA d 


NA C 


0.900 


27,156 


5.56 


93 


PRE8 


6.9 


0.7 


0.129 


27,334 


6.13 


115 


YHR049W 


18.4 


2.2 


0.520 


27,472 


5.33 


92 


YNL010W 


31.6 


3.7 


0.421 


27,480 


8.95 


123 


GPM1 


10.0 


169.4 


0.902 


27,480 


8.95 


124 


GPM1 


231.4 


169.4 


0.902 


27,480 


8.95 


125 


GPM1 


7.5 


169.4 


0.902 


27,809 


5.97 


139 


HOR2 


5.7 


0.7 


0.381 


27,874 


4.46 


78 


YST1 


13.6 


52.8 


0.805 


28,595 


4.51 


41 


PUP2 


4.4 


0.7 


0.147 


29,156 


6.59 


114 


YMR226C 


14.5 


2.2 


0.283 


29,244 


8.40 


120 


DPMI 


5.0 


11.2 


0.362 


29,443 


5.91 


48 


PRE4 


3.4 


3.7 


0.162 


30,012 


6.39 


138 


PRB1 


21.2 


1.5 


0.449 


30,073 


4.63 


77 


BMH1 


14.7 


28.2 


0.454 


30,296 


7.94 


121 


OMP2 


67.4 


41.6 


0.499 


30,435 


6.34 


89 


GPP1 


70.2 


11.2 


0.703 


31,332 


5.57 


88 


ILV6 


13.9 


3.0 


0.402 


32,159 


5.46 


113 


IPP1 


63.1 


3.7 


0.752 


32,263 


6.00 


149 


HIS1 


22.4 


4.5 


0.232 


33,311 


5.35 


84 


SPE3 


15.1 


6.7 


0.468 


34,465 


5.60 


129 


ADE1 


8.7 


5.2 


0.305 


34,762 


5.32 


85 


SEC14 


10.9 


6.0 


0.373 


34,797 


5.85 


42 


URA1 


49.5 


8.9 


0.237 


34,799 


6.04 


90 


BEL1 


103.2 


81.0 


0.875 


35,556 


5.97 


43 


YDL124W 


6.4 


4.5 


0.206 


35,619 


8.41 


59 


TDH1 


69.8 


32.r 


0.940 


35,650 


5.49 


68 


CAR1 


5.2 


3.0 


0.339 


35,712 


6.72 


117 


TDH2 


49.6 


473^ 


0.982 


35,712 


6.72 


154 


TDH2 


863.5 


473.(T 


0.982 


35,712 


6.72 


155 


TDH2 


79.4 


473.<r 


0.982 


36,272 


4.85 


128 


APA1 


8.7 


0.7 


0.425 


36,358 


5.05 


75 


YJR105W 


17.6 


17.1 


0.522 


36,358 


5.05 


76 


YJR105W 


27.5 


17.1 


0.522 


36,596 


6.37 


79 


ADH2 


58.9 


260.(T 


0.711 


36,714 


6.30 


102 


ADH1 


746.1 


260.0 


0.913 


36,714 


6.30 


103 


ADH1 


17.6 


260.0 


0.913 


36,714 


6.30 


104 


ADH1 


61.4 


260.0 


0.913 


36,714 


6.30 


105 


ADH1 


52.7 


260.0 


0.913 


37,033 


6.23 


44 


TALI 


44.8 


3.7 


0.701 


37,796 


7.36 


57 


IDH2 


29.4 


6.7 


0.330 


37,886 


6.49 


106 


ILV5 


76.0 


4.5 


0.892 


38,700 


7.83 


55 


BAT1 


30.9 


11.2 


0.469 


38,702 


6.24 


46 


QCR2 


NA rf 


2.2 


0.326 



Mol wt 


Pi 


Spot no. 


YPD gene 
name 0 


Protein 
abundance 

/ 1 (V^ rnnipc / 

copies/ 
cell) 


mRNA 
abundance 
(copies/cell) 


Codon 
bias 


39,477 


5.58 


86 


FBA1 


17.8 


183.6 


0.935 


39,477 


5.58 


87 


FBA1 


427.2 


183.6 


0.935 


39,540 


6.50 


150 


HOM2 


60.3 


4.5 


0.592 


39,561 


6.12 


156 


PSA1 


96.4 


27.5 


0.718 


41,158 


6.01 


49 


YNL134C 


14.9 


1.5 


0.316 


41,623 


7.18 


58 


BAT2 


19.0 


8.9 


0.250 


41,728 


7.29 


110 


ERG 10 


24.1 


4.5 


0.543 


41,900 


5.42 


74 


TOM40 


22.3 


2.2 


0.375 


42,402 


6.29 


45 


CYS3 


6.7 


8.9 


0.621 


42,883 


5.63 


67 


DYS1 


15.8 


5.2 


0.526 


43,409 


6.31 


107 


SER1 


10.5 


1.5 


0.292 


43,421 


5.59 


91 


ERG6 


2.2 


14.1 


0.408 


44,174 


7.32 


56 


YBR025C 


13.1 


6.0 


0.684 


44,682 


4.99 


72 


TIF1 


2.9 


39.4 


0.834 


44,707 


7.77 


108 


PGK1 


23.7 


165.7 


0.897 


44,707 


7.77 


109 


PGK1 


315.2 


165.7 


0.897 


46,080 


6.72 


30 


CAR2 


15.4 


NA C ' 


0.495 


46,383 


8.52 


53 


IDP1 


7.7 


0.7 


0.436 


46,553 


5.98 


47 


IDP2 


32.4 


NA C 


0.197 


46,679 


6.39 


50 


ENOl 


35.4 


0.7 


0.930 


46,679 


6.39 


51 


ENOl 


6.6 


0.7 


0.930 


46,679 


6.39 


52 


ENOl 


2.2 


0.7 


0.930 


46,773 


5.82 


63 


EN02 


15.5 


289.1 


0.960 


46,773 


5.82 


64 


EN02 


635.5 


289.1 


0.960 


46,773 


5.82 


65 


EN02 


93.0 


289.1 


0.960 


46,773 


5.82 


66 


EN02 


31.0 


289.1 


0.960 


47,402 


6.09 


126 


COR1 


2.5 


0.7 


0.422 


47,666 


8.98 


54 


AAT2 


11.7 


6.0 


0.338 


48,364 


5.25 


73 


WTM1 


74.5 


13.4 


0.365 


48,530 


6.20 


61 


MET17 


38.1 


29.0 


0.576 


48,904 


5.18 


69 


LYS9 


16.2 


3.7 


0.463 


48,987 


4.90 


153 


SUP45 


29.6 


11.9 


0.377 


49,727 


5.47 


70 


PR02 


13.6 


5.2 


0.297 


49,912 


9.27 


62 


TEF2 


558.5 


282.0 


0.932 


50,444 


5.67 


35 


YDR190C 


4.8 


2.2 


0.228 


50,837 


6.11 


32 


YEL047C 


3.8 


1.5 


0.387 


50,891 


4.59 


151 


TUB2 


11.2 


7.4 


0.404 


51,547 


6.80 


27 


LPD1 


18.9 


2.2 


0.351 


52,216 


7.25 


29 


SHM2 


19.7 


7.4 


0.722 


52,859 


5.54 


37 


YFR044C 


30.2 


6.7 


0.442 


53,798 


5.19 


71 


HXK2 


26.5 


7.4 


0.756 


53,803 


6.05 


145 


GYP6 


4.4 


0.7 


0.147 


54,403 


5.29 


39 


ALD6 


37.7 


2.2 


0.664 


54,403 


5.29 


40 


ALD6 


6.6 


2.2 


0.664 


54,502 


6.20 


31 


ADE13 


6.3 


1.5 


0.417 


54,543 


7.75 


25 


PYK1 


225.3 


101.8 


0.965 


54,543 


7.75 


26 


PYK1 


39.8 


101.8 


0.965 


55,221 


6.66 


146 


YEL071W 


16.3 


3.0 


0.244 


55,295 


4.35 


134 


PDI1 


66.2 


14.1 


0.589 


55,364 


5.98 


24 


GLK1 


22.6 


6.0 


0.237 


55,481 


7.97 


118 


ATP1 


21.6 


2.2 


0.637 


55,886 


6.47 


28 


CYS4 


22.2 


NA C 


0.444 


56,167 


5.83 


33 


AR08 


14.3 


3.0 


0.324 


56,167 


5.83 


34 


AR08 


9.1 


3.0 


0.324 


56,584 


6.36 


20 


CYB2 


18.9 


NA C 


0.259 


57,366 


5.53 


60 


FRS2 


2.3 


0.7 


0.451 


57,383 


5.98 


144 


ZWF1 


5.6 


0.7 


0.215 


57,464 


5.49 


36 


THR4 


21.4 


3.7 


0.508 


57,512 


5.50 


7 


SRV2 


6.5 


NA C 


0.260 


57,727 


4.92 


152 


VMA2 


33.7 


8.9 


0.546 


58,573 


6.47 


17 


ACH1 


4.4 


1.5 


0.327 


58,573 


6.47 


18 


ACH1 


5.4 


1.5 


0.327 


61,353 


5.87 


21 


PDC1 


6.5 


200.7 


0.962 


61,353 


5.87 


22 


PDC1 


303.2 


200.7 


0.962 


61,353 


5.87 


23 


PDC1 


16.3 


200.7 


0.962 


61,649 


5.54 


38 


CCT8 


2.2 


1.5 


0.271 



Continued 
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TABLE 1— Continued 



Mol wt 


Pi 


Spot no. 


i ru gene 


Protein 
abundance 

cell) 


mRNA 
abundance 
(copies/cell) 


Codon 
bias 


61,902 


6.21 


101 


PDC5 


4.3 


NA C 


0.828 


62,266 


6.19 


16 


ICL1 


20.1 


NA C 


0.327 


62,862 


8.02 


19 


ILV3 


5.3 


4.5 


0.548 


63,082 


6.40 


119 


PGM2 


2.2 


3.0 


0.402 


64,335 


5.77 


5 


PAB1 


30.4 


1.5 


0.616 


66,120 


5.42 


8 


STT1 


6.7 


0.7 


0.313 


66,120 


5.42 


9 


STI1 


6.4 


0.7 


0.313 


66,450 


5.29 


141 


SSB2 


7.0 


NA C 


0.880 


66,450 


5.29 


142 


SSB2 


2.3 


NA C 


0.880 


66,456 


5.23 


10 


SSB1 


64.5 


79.5 


0.907 


66,456 


5.23 


11 


SSB1 


59.0 


79.5 


0.907 


66,456 


5.23 


12 


SSB1 


13.7 


79.5 


0.907 


68,397 


5.82 


82 


LEU4 


3.1 


3.0 


0.407 


69,313 


4.90 


13 


SSA2 


24.3 


18.6 


0.892 


69,313 


4.90 


14 


SSA2 


77.1 


18.6 


0.892 


74,378 


8.46 


15 


YKL029C 


2.8 


3.7 


0.353 


75,396 


5.82 


6 


GRS1 


5.5 


7.4 


0.500 


85,720 


6.25 


1 


MET6 


2.0 


NA t- 


0.772 


85,720 


6.25 


2 


MET6 


10.9 


NA C 


0.772 


85,720 


6.25 


3 


MET6 


1.4 


NA C 


0.772 


93,276 


6.11 


131 


EFT1 


17.9 


41.6 


0.890 


93,276 


6.11 


132 


EFT1 


5.7 


41.6 


0.890 


102,064' 


6.61 e 


94 


ADE3 


4.8 


5.2 


0.423 


107,482* 


5.33' 


4 


MCM3 


2.7 


NA C 


0.240 



a YPD gene names are available from the YPD website (39). 
b NA, calculation could not be performed or was not available. 
c mRNA data inconclusive or NA. 
No methionines in predicted ORF; therefore, protein concentration was not 
determined. 

e Measured molecular weight or pi did not match theoretical molecular weight 
or pi. 



Protein quantitation. [ 35 S]methionine-labeled gels were exposed to X-ray film 
overnight, and then the silver stain and film were used to excise 156 spots of 
varying intensities, molecular weights, and pis. The excised spots were placed in 
0.6-ml microcentrifuge tubes, and scintillation cocktail (100 u.1) was added. The 
samples were vortexed and counted. In addition, two parallel gels were electro- 
blotted to polyvinylidene difluoride membranes. The membranes were exposed 
to X-ray film, and four intense single spots were excised from each membrane 
and subjected to amino acid analysis. For these four spots, a mean of 209 ± 4 
cpm/pmol of protein/methionine was found. This number was used to quantitate 
all remaining spots in conjunction with the number of methionines present in the 
protein. 

To ensure that proteins were labeled to equilibrium, parallel 2D gels were 
prepared and run on yeast metabolicaily labeled for 1, 2, 6, or 18 h. The 
corresponding 156 spots were excised from each gel, and radioactivity was mea- 
sured by liquid scintillation counting for each spot. Calculated protein levels were 
highly reproducible for all time points measured after 1 h. 

Calculation of codon bias and predicted half-life. Codon bias values were 
extracted from the YPD spreadsheet (17). Protein half-lives were calculated 
based on the N-end rule (33). When the N-terminal processing was not known 
experimentally, it was predicted based on the affinity of methionine aminopep- 
tidase(31). 

RESULTS 

Characteristics of proteome approach. Nearly every facet of 
proteome analysis hinges on the unambiguous identification of 
large numbers of expressed proteins in cells. Several tech- 
niques have been described previously for the identification of 
proteins separated by 2DE, including N-terminal and internal 
sequencing (1, 2), amino acid analysis (38), and more recently 
mass spectrometry (25). We utilized techniques based on mass 
spectrometry because they afford the highest levels of sensitiv- 
ity and provide unambiguous identification. The specific pro- 
cedure used is schematically illustrated in Fig. 1 and is based 
on three principles. First, proteins are removed from the gel by 



proteolytic in-gel digestion, and the resulting peptides are sep- 
arated by on-line capillary high-performance liquid chromatog- 
raphy. Second, the eluting peptides are ionized and detected, and 
the specific peptide ions are selected and fragmented by the 
mass spectrometer. To achieve this, the mass spectrometer 
switches between the MS mode (for peptide mass identifica- 
tion) and the MS/MS mode (for peptide characterization and 
sequencing). Selected peptides are fragmented by a process 
called collision-induced dissociation (CID) to generate a tan- 
dem mass spectrum (MS/MS spectrum) that contains the pep- 
tide sequence information. Third, individual CID mass spectra 
are then compared by computer algorithms to predicted spec- 
tra from a sequence database. This results in the identification 
of the peptide and, by association, the protein(s) in the spot. 
Unambiguous protein identification is attained in a single anal- 
ysis by the detection of multiple peptides derived from the 
same protein. 

Protein identification. Yeast total cell protein lysate (40 p,g), 
metabolicaily labeled with [ 35 S]methionine, was electro- 
phoretically separated by isoelectric focusing in the first dimen- 
sion and by SDS-10% polyacrylamide gel electrophoresis in 
the second dimension. Proteins were visualized by silver stain- 
ing and by autoradiography. Of the more than 1,000 proteins 
visible by silver staining, 156 spots were excised from the gel 
and subjected to in-gel tryptic digestion, and the resulting 
peptides were analyzed and identified by microspray LC- 
MS/MS techniques as described above. The proteins in this 
study were all identified automatically by computer software 
with no human interpretation of mass spectra. They are indi- 
cated in Fig. 2 and detailed in Table 1. 

The CID spectra shown in Fig. 3 indicate that the quality of 
the identification data generated was suitable for unambiguous 
protein identification. The spectra represent the amino acid 
sequences of tryptic peptides NSGDIVNLGSIAGR (Fig. 3A) 
and FAVGAFTDSLR (Fig. 3B). Both peptides were derived 
from protein S57593 (hypothetical protein YMR226C), which 
migrated to spot 114 (molecular weight, 29,156; pi, 6.59) in the 
2D gel in Fig. 2. Five other peptides from the same analysis 
were also computer matched to the same protein sequence. 

Protein and mRNA quantitation. For the 156 genes investi- 
gated, the protein expression levels ranged from 2,200 (PGM2) 
to 863,000 (TDH2/TDH3) copies/cell. The levels of mRNA for 
each of the genes identified were calculated from SAGE fre- 
quency tables (35). These tables contain the mRNA levels for 
4,665 genes in yeast strain YPH499 grown to mid-log phase in 
YPD medium on glucose as a carbon source. In some in- 
stances, the mRNA levels could not be calculated for reasons 
stated in Materials and Methods. For the proteins analyzed in 
this study, mean transcript levels varied from 0.7 to 473 copies/ 
cell. 

Selection of the sample population for mRNA-protein ex- 
pression level correlation. The protein spots selected for iden- 
tification were selected from spots visible by silver staining in 
the 2D gel. An attempt was made not to include spots where 
overlap with other spots was readily apparent. The number of 
proteins identified was 156 (Table 1). Some proteins migrated 
to more than one spot (presumably due to differential protein 
processing or modifications), and protein levels from these 
spots were calculated by integrating the intensities of the dif- 
ferent spots. The 156 protein spots analyzed represented the 
products of 128 different genes. Genes were excluded from the 
correlation analysis only if part of the data set was missing; i.e., 
genes were excluded if (i) no mRNA expression data were 
available for the protein or putative SAGE tags were ambig- 
uous, (ii) the amino acid sequence did not contain methionine, 
(iii) more than a single protein was conclusively identified as 
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migrating to the same gel spot, or (iv) the theoretical and 
observed pis and molecular weights could not be reconciled. 
After these criteria were applied, the number of genes used in 
the correlation analysis was 106. 



Codon bias and predicted half-lives. Codon bias is thought 
to be an indicator of protein expression, with highly expressed 
proteins having large codon bias values. The codon bias distri- 
bution for the entire set of more than 6,000 predicted yeast 
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gene ORFs is presented in Fig. 4A. The interval with the 
largest frequency of genes is between the codon bias values of 
0.0 and 0.1. This segment contains more than 2,500 genes. The 
distribution of the codon bias values of the 128 different genes 
found in this study (all protein spots from Fig. 2) is shown in 
Fig. 4B, and protein half-lives (predicted from applying the 
N-end rule [33] to the experimentally determined or predicted 
protein N termini) are shown in Fig. 4C. No genes were iden- 
tified with codon bias values less than 0.1 even though thou- 
sands of genes exist in this category. In addition, nearly all of 
the proteins identified had long predicted half-lives (greater 
than 30 h). 

Correlation of mRNA and protein expression levels. The 

correlation between mRNA and protein levels of the genes 
selected as described above is shown in Fig. 5. For the entire 
group (106 genes) for which a complete data set was gener- 
ated, there was a general trend of increased protein levels 
resulting from increased mRNA levels. The Pearson product 
moment correlation coefficient for the whole data set (106 
genes) was 0.935. This number is highly biased by a small 
number of genes with very large protein and message levels. A 
more representative subset of the data is shown in the inset of 
Fig. 5. It shows genes for which the message level was below 10 
copies/cell and includes 69% (73 of 106 genes) of the data used 
in the study. The Pearson product moment correlation coeffi- 
cient for this data set was only 0.356. We also found that levels 
of protein expression coded for by mRNA with comparable 
abundance varied by as much as 30-fold and that the mRNA 
levels coding for proteins with comparable expression levels 
varied by as much as 20-fold. 

The distortion of the correlation value induced by the un- 
even distribution of the data points along the x axis is further 
demonstrated by the analysis in Fig. 6. The 106 samples in- 
cluded in the study were ranked by protein abundance, and the 
Pearson product moment correlation coefficient was repeat- 
edly calculated after including progressively more, and higher- 
abundance, proteins in each calculation. The correlation values 
remained relatively stable in the range of 0.1 to 0.4 if the 
lowest-expressed 40 to 95 proteins used in this study were 
included. However, the correlation value steadily climbed by 
the inclusion of each of the 11 very highly expressed proteins. 

Correlation of protein and mRNA expression levels with 
codon bias. Codon bias is the propensity for a gene to utilize 
the same codon to encode an amino acid even though other 
codons would insert the identical amino acid in the growing 
polypeptide sequence. It is further thought that highly ex- 
pressed proteins have large codon biases (3). To assess the 
value of codon bias for predicting mRNA and protein levels in 
exponentially growing yeast cells, we plotted the two experi- 
mental sets of data versus the codon bias (Fig. 7). The distri- 
bution patterns for both mRNA and protein levels with respect 
to codon bias were highly similar. There was high variability in 
the data within the codon bias range of 0.8 to 1.0. Although a 
large codon bias generally resulted in higher protein and mes- 
sage expression levels, codon bias did not appear to be predic- 
tive of either protein levels or mRNA levels in the cell. 

DISCUSSION 

The desired end point for the description of a biological 
system is not the analysis of mRNA transcript levels alone but 
also the accurate measurement of protein expression levels and 
their respective activities. Quantitative analysis of global 
mRNA levels currently is a preferred method for the analysis 
of the state of cells and tissues (11). Several methods which 
either provide absolute mRNA abundance (34, 35) or relative 
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ing highly expressed proteins generally have large codon bias values. (A) Distri- 
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genes. (B) Distribution of the genes from identified proteins in this study based 
on codon bias. No genes with codon bias values less than 0.1 were detected in this 
study. (C) Distribution of identified proteins in this study based on predicted 
half-life (estimated by N-end rule). 



mRNA levels in comparative analyses (20, 27) have been de- 
scribed elsewhere. The techniques are fast and exquisitely sen- 
sitive and can provide mRNA abundance for potentially any 
expressed gene. Measured mRNA levels are often implicitly or 
explicitly extrapolated to indicate the levels of activity of the 
corresponding protein in the cell. Quantitative analysis of pro- 
tein expression levels (proteome analysis) is much more time- 
consuming because proteins are analyzed sequentially one by 
one and is not general because analyses are limited to the 
relatively highly expressed proteins. Proteome analysis does, 
however, provide types of data that are of critical importance 
for the description of the state of a biological system and that 
are not readily apparent from the sequence and the level of 
expression of the mRNA transcript. This study attempts to 
examine the relationship between mRNA and protein expres- 
sion levels for a large number of expressed genes in cells 
representing the same state. 

Limits in the sensitivity of current protein analysis technol- 
ogy precluded a completely random sampling of yeast proteins. 
We therefore based the study on those proteins visible by silver 
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FIG. 5. Correlation between protein and mRNA levels for 106 genes in yeast growing at log phase with glucose as a carbon source. mRNA and protein levels were 
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69% of the original data set. The Pearson product moment correlation for the entire data set was 0.935. The correlation for the inset containing 73 proteins (69%) was 
only 0.356. 



staining on a 2D gel. Of the more than 1,000 visible spots, 156 
were chosen to include the entire range of molecular weights, 
isoelectric focusing points, and staining intensities displayed on 
the 2D protein pattern. The genes identified in this study 
shared a number of properties. First, all of the proteins in this 
study had a codon bias of greater than 01 and 93% were 
greater than 0.2 (Fig. 4B). Second, with few exceptions, the 
proteins in this study had long predicted half-lives according to 
the N-end rule (Fig. 4C). Third, low-abundance proteins with 
regulatory functions such as transcription factors or protein 
kinases were not identified. 

Because the population of proteins used in this study ap- 
pears to be fairly homogeneous with respect to predicted half- 
life and codon bias, it might be expected that the correlation of 
the mRNA and protein expression levels would be stronger for 
this population than for a random sample of yeast proteins. We 
tested this assumption by evaluating the correlation value if 
different subsets of the available data were included in the 
calculation. The 106 proteins were ranked from lowest to high- 
est protein expression level, and the trend in the correlation 
value was evaluated by progressively including more of the 
higher-abundance proteins in the calculation (Fig. 6). The cor- 
relation value when only the lower-abundance 40 to 93 pro- 
teins were examined was consistently between 0.1 and 0.4. If 
the 11 most abundant proteins were included, the correlation 
steadily increased to 0.94. We therefore expect that the corre- 
lation for all yeast proteins or for a random selection would be 
less than 0.4. The observed level of correlation between 
mRNA and protein expression levels suggests the importance 



of posttranslational mechanisms controlling gene expression. 
Such mechanisms include translational control (15) and con- 
trol of protein half-life (33). Since these mechanisms are also 
active in higher eukaryotic cells, we speculate that there is no 
predictive correlation between steady-state levels of mRNA 
and those of protein in mammalian cells. 

Like other large-scale analyses, the present study has several 
potential sources of error related to the methods used to de- 
termine mRNA and protein expression levels. The mRNA 
levels were calculated from frequency tables of SAGE data. 
This method is highly quantitative because it is based on actual 
sequencing of unique tags from each gene, and the number of 
times that a tag is represented is proportional to the number of 
mRNA molecules for a specific gene. This method has some 
limitations including the following: (i) the magnitude of the 
error in the measurement of mRNA levels is inversely propor- 
tional to the mRNA levels, (ii) SAGE tags from highly similar 
genes may not be distinguished and therefore are summed, (iii) 
some SAGE tags are from sequences in the 3' untranslated 
region of the transcript, (iv) incomplete cleavage at the SAGE 
tag site by the restriction enzyme can result in two tags repre- 
senting one mRNA, and (v) some transcripts actually do not 
generate a SAGE tag (34, 35). 

For the SAGE method, the error associated with a value 
increases with a decreasing number of transcripts per cell. The 
conclusions drawn from this study are dependent on the qual- 
ity of the mRNA levels from previously published data (35). 
Since more than 65% of the mRNA levels included in this 
study were calculated to 10 copies/cell or less (40% were less 
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FIG. 6. Effect of highly abundant proteins on Pearson product moment correlation coefficient for mRNA and protein abundance in yeast. The set of 106 genes was 
ranked according to protein abundance, and the correlation value was calculated by including the 40 lowest-abundance genes and then progressively including the 
remaining 66 genes in order of abundance. The correlation value climbs as the final 11 highly abundant proteins are included. 



than 4 copies/cell), the error associated with these values may 
be quite large. The mRNA levels were calculated from more 
than 20,000 transcripts. Assuming that the estimate of 15,000 
mRNA molecules per cell is correct (16), this would mean that 
mRNA transcripts present at only a single copy per cell would 
be detected 72% of the time (35). The mRNA levels for each 
gene were carefully scrutinized, and only mRNA levels for 
which a high degree of confidence existed were included in the 
correlation value. 

Protein abundance was determined by metabolic radiolabel- 
ing with [ 35 S]methionine. The calculation required knowledge 
of three variables: the number of methionines in the mature 
protein, the radioactivity contained in the protein, and the 
specific activity of the radiolabel normalized per methionine. 
The number of methionines per protein was determined from 
the amino acid sequence of the proteins identified by tandem 
mass spectrometry. For some proteins, it was not known 
whether the methionine of the nascent polypeptide was pro- 
cessed away. The N termini of those proteins were predicted 
based on the specificity of methionine aminopeptidase (31). If 
the N-terminal processing did not conform to the predicted 
specificity of processing enzymes, the calculation of the num- 
ber of methionines would be affected. This discrepancy would 
affect most the quantitation of a protein with a very low num- 
ber of methionines. The average number of calculated methi- 
onines per protein in this study was 7.2. We therefore expect 
the potential for erroneous protein quantitation due to un- 
usual N-terminal processing to be small. 



The amount of radioactivity contained in a single spot might 
be the sum of the radioactivity of comigrating proteins. Be- 
cause protein identification was based on tandem mass spec- 
trometric techniques, comigrating proteins could be identified. 
However, comigrating proteins were rarely detected in this 
study, most likely because relatively small amounts of total 
protein (40 jxg) were initially loaded onto the gels, which re- 
sulted in highly focused spots containing generally 1 to 25 ng of 
protein. Because of the relatively small amount loaded, the 
concentrations of any potentially comigrating protein would 
likely be below the limit of detection of the mass spectrometry 
technique used in this study (1 to 5 ng) and below the limit of 
visualization by silver staining (1 to 5 ng). In the overwhelming 
majority of the samples analyzed, numerous peptides from a 
single protein were detected. It is assumed that any comigrat- 
ing proteins were at levels too low to be detected and that their 
influence in the calculation would be small. 

The specific activity of the radiolabel was determined by 
relating the precise amount of protein present in selected spots 
of a parallel gel, as determined by quantitative amino acid 
composition analysis, to the number of methionines present in 
the sequence of those proteins and the radioactivity deter- 
mined by liquid scintillation counting. It is possible that the 
resulting number might be influenced by unavoidable losses 
inherent in the amino acid analysis procedure applied. Because 
four different proteins were utilized in the calculation and the 
experiment was done in duplicate, the specific activity calcu- 
lated is thought to be highly accurate. Indeed, the specific 
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FIG. 7. Relationship between codon bias and protein and mRNA levels in this study. Yeast mRNA and protein expression levels were calculated as described in 
Materials and Methods. The data represent the same 106 genes as in Fig. 5. 



activities calculated for each of the four proteins varied by less 
than 10%. Any inconsistencies in the calculation of the specific 
activity would result in differences in the absolute levels calcu- 
lated but not in the relative numbers and would therefore not 
influence the correlation value determined. 

The protein quantitative method used eliminates a number 
of potential errors inherent in previous methods for the quan- 
titation of proteins separated by 2DE, such as preferential 
protein staining and bias caused by inequalities in the number 
of radiolabeled residues per protein. Any 2D gel-based method 
of quantitation is complicated by the fact that in some cases the 
translation products of the same mRNA migrated to different 
spots. One major reason is posttranslational modification or 
processing of the protein. Also, artifactual proteolysis during 
cell lysis and sample preparation can lead to multiple resolved 
forms of the protein. In such cases, the protein levels of spots 
coded for by the same mRNA were pooled. In addition, the 
existence of other spots coded for by the same mRNA that 
were not analyzed by mass spectrometry or that were below the 
limit of detection for silver staining cannot be ruled out. How- 
ever, since this study is based on a class of highly expressed 
proteins, the presence of undetected minor spots below silver 
staining sensitivity corresponding to a protein analyzed in the 
study would generally cause a relatively small error in protein 
quantitation. 

Codon bias is a measure of the propensity of an organism to 
selectively utilize certain codons which result in the incorpo- 
ration of the same amino acid residue in a growing polypeptide 
chain. There are 61 possible codons that code for 20 amino 
acids. The larger the codon bias value, the smaller the number 
of codons that are used to encode the protein (19). It is 



thought that codon bias is a measure of protein abundance 
because highly expressed proteins generally have large codon 
bias values (3, 13). 

Nearly all of the most highly expressed proteins had codon 
bias values of greater than 0.8. However, we detected a number 
of genes with high codon bias and relative low protein abun- 
dance (Fig. 7). For example, the expressed gene with both the 
second largest protein and mRNA levels in the study was 
EN02YEAST (775,000 and 289.1 copies/cell, respectively). 
EN01_YEAST was also present in the gel at much lower 
protein and mRNA levels (44,200 and 0.7 copies/cell, respec- 
tively). The codon bias values for EN02 and ENOl are similar 
(0.96 and 0.93, respectively), but the expression of the two 
genes is differentially regulated. Specifically, ENOl JYEAST is 
glucose repressed (6) and was therefore present in low abun- 
dance under the conditions used. Other genes with large codon 
bias values that were not of high protein abundance in the gel 
include EFT1, TIF1, HXK2, GSP1, EGD2, SHM2, and TALI. 
We conclude that merely determining the codon bias of a gene 
is not sufficient to predict its protein expression level. 

Interestingly, codon bias appears to be an excellent indicator 
of the boundaries of current 2D gel proteome analysis tech- 
nology. There are thousands of genes with expressed mRNA 
and likely expressed protein with codon bias values less than 
0.1 (Fig. 4A). In this study, we detected none of them, and only 
a very small percentage of the genes detected in this study had 
codon bias values between 0.1 and 0.2 (Fig. 4B). Indeed, in 
every examined yeast proteome study (5, 7, 13, 28) where the 
combined total number of identified proteins is 300 to 400, this 
same observation is true. It is expected that for the more 
complex cells of higher eukaryotic organisms the detection of 
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low-abundance proteins would be even more challenging than 
for yeast. This indicates that highly abundant, long-lived pro- 
teins are overwhelmingly detected in proteome studies. If pro- 
teome analysis is to provide truly meaningful information 
about cellular processes, it must be able to penetrate to the 
level of regulatory proteins, including transcription factors and 
protein kinases. A promising approach is the use of narrow- 
range focusing gels with immobilized pH gradients (IPG) (23). 
This would allow for the loading of significantly more protein 
per pH unit covered and also provide increased resolution of 
proteins with similar electrophoretic mobilities. A standard pH 
gradient in an isoelectric focusing gel covers a 7-pH-unit range 
(pH 3 to 10) over 18 cm. A narrow-range focusing gel might 
expand the range to 0.5 pH units over 18 cm or more. This 
could potentially increase by more than 10-fold the number of 
proteins that can be detected. Clearly, current proteome tech- 
nology is incapable of analyzing low-abundance regulatory pro- 
teins without employing an enrichment method for relatively 
low-abundance proteins. In conclusion, this study examined 
the relationship between yeast protein and message levels and 
revealed that transcript levels provide little predictive value 
with respect to the extent of protein expression. 
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numerous molecular biology techniques including qualitative Polymerase Chain Reaction (PCR) 
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Chromosomal aberrations, such as gene amplification, and chromosomal translocations are 
important markers of specific types of cancer and lead to the aberrant expression of specific 
genes and their encoded polypeptides, including over-expression and under-expression. For 
example, gene amplification is a process in which specific regions of a chromosome are 
duplicated, thus creating multiple copies of certain genes that normally exist as a single copy. 
Gene under-expression can occur when a gene is not transcribed into mRNA. In addition, 
chromosomal translocations occur when two different chromosomes break and are rejoined to 
each other chromosome resulting in a chimeric chromosome which displays a different expression 
pattern relative to the parent chromosomes. Amplification of certain genes such as Her2/Neu 
[Singleton etal, Pathol. Annu., 27Ptl: 165-190], or chromosomal translocations such as t(5;14), 
[Grimaldi etal, Blood, 73(8):2081-2085(1989); Meeker et al, Blood . 76(2):285-289(l 990)] 'give 
cancer cells a growth or survival advantage relative to normal cells, and might also provide a 
mechanism of tumor cell resistance to chemotherapy or radiotherapy. When the chromosomal 
aberration results in the aberrant expression of a mRNA and die corresponding gene product (the 
polypeptide), as it does in the aforementioned cases, the gene product is a promising target for 
cancer therapy, for example, by the therapeutic antibody approach. 

5. Comparison of gene expression levels in normal versus diseased tissue has 
important implications both diagnostically and therapeutically. For example, those who work in 
this field are well aware that in the vast majority of cases, when a gene is over-expressed, as 
evidenced by an increased production of mRNA, the gene product or polypeptide will also be , 
over-expressed. It is unlikely that one identifies increased mRNA expression without associated 
increased protein expression. This same principle applies to gene under-expression. When a 
gene is under-expressed, the gene product is also likely to be under-expressed. Stated in another 
way, two cell samples which have differing mRNA concentrations for a specific gene are 
expected to have correspondingly different concentration of protein for that gene. Techniques 
used to detect mRNA, such as Northern Blotting, Differential Display, in situ hybridization, 
quantitative PCR, Taqman, and more recently Microarray technology all rely on the dogma that a 
change in mRNA will represent a similar change in protein. If this dogma did not hold true then 
these techniques would have little value and not be so widely used. The use of mRNA 
quantitation techniques have identified a seemingly endless number of genes which are 
differentially expressed in various tissues and these genes have subsequently been shown to have 
correspondingly similar changes in their protein levels. Thus, the detection of increased mRNA 
expression is expected to result in increased polypeptide expression, and the detection of 
decreased mRNA expression is expected to result in decreased polypeptide expression. The 
detection of increased or decreased polypeptide expression can be used for cancer diagnosis and 
treatment. 

6. However, even in the rare case where the protein expression does not correlate 
with the mRNA expression, this still provides significant information useful for cancer diagnosis 
and treatment. For example, if over- or under-expression of a gene product does not correlate 
with over- or under-expression of mRNA in certain tumor types but does so in others, then 
identification of both gene expression and protein expression enables more accurate tumor 
classification and hence better determination of suitable therapy. In addition, absence of over- or 
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under-expression of the gene product in the presence of a particular over- or under-expression of 
mRNA is crucial information for the practicing clinician. For example, if a gene is over-expressed 
but the corresponding gene product is not significantly over-expressed, the clinician accordingly 
will decide jiot to treat a patient with agents that target that gene product 

7. I hereby declare that all statements made herein of my own knowledge are true and 
that all statements made on information or belief are believed to be true, and further that these 
statements were made with the knowledge that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under Section 1001 of Title 1 8 of the United States 
Code and that such willful statements may jeopardize the validity of the application or any 
patent issued thereon. 
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The t(5;14) Chromosomal translocation in a Case of Acute Lymphocytic 
leukemia Joins the Interleukin-3 Gene to the Immunoglobulin Heavy Chain Gene 

By J. Christopher Grimaldi and Timothy C. Meeker 



Chromosomal translocations have proven to be Important 
nrvarkers of the genetic abnormalities central to the patho- 
genesis of cancer. By cloning chromosomal breakpoints 
oris can identify activated proto-oncogenes. We have stud- 
ied a case of B-lineage acute lymphocytic leukemia (ALL) 
tt-aat was associated with peripheral blood eosinophilia. The 
cnromosomal translocation t(5:14) (q31;g32) from this 
sample was cloned and studied at the molecular level. This 

KARYOTYPIC STUDIES of leukemia and lymphoma 
have identified frequent nonrandom chromosomal 
translocations. Some of these translocations juxtapose the 
innmunoglobulin heavy chain (IgH) gene with important 
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Fig 1. DIMA blots of the leukemia sample. The restriction 
fragment pattern of normal human DNA (N) and the leukemia 
sample (L) were compared using a human Jh probe. Rearranged 
bands are indicated by arrows. Sample L exhibits a single rear- 
ranged band with both Hind IH/EcoRI and Sa</3A restriction 
digests. The rearranged bands are less Intense than the other 
bands because the majority of cells in the sample represent normal 
bone marrow elements. 



translocation joined the immunoglobulin heavy chain Join- 
ing ( Jh) region to the promotor region of the interleukin-3 
OL-3) gene in opposite transcriptional orientations. The 
data suggest that activation of the IU3 gene by the 
enhancer of the immunoglobulin heavy chain gene may play 
a central role in the pathogenesis of this leukemia and the 
associated eosinophilia. 
* 1989 by Grune & Stratton, Inc. 

protooncogenes, such as omyc and 6c7-2. u In this way, the 
IgH gene can activate proto-oncogenes, resulting in disor- 
dered gene expression and a step in the development of 
cancer. The investigation of additional nonrandom transloca- 
tions into the IgH locus allows us to identify new genes 
promoting the generation of leukemia and lymphoma. 

A distinct subtype of acute lymphocytic leukemia (ALL) 
has been characterized by B-lineage phenotype, associated 
eosinophilia in the peripheral blood, and a t(5;14)(q31;q32) 
chromosomal translocation.* 4 This syndrome probably 
occurs in <l% of all patients with ALL. We hypothesized 
that the cloning of the translocation characteristic of this 
leukemia might allow the identification of an important gene 
on chromosome 5 that plays a role in the evolution of this 
disease. In this report we demonstrate that the interleukin-3 
gene (IL-3) and the IgH gene are joined by this transloca- 
tion. 

MATERIALS AND METHODS 

Sample and DNA blots. A bone marrow aspirate from a repre- 
sentative patient with ALL (LI morphology by French-American- 
British [FAB] criteria), peripheral eosinophilia (up to 20,000 per 
microliter with a normal value of <350 per microliter) and a 
t(5;!4)(q31;q32) translocation was studied. Using published meth- 
ods, genomic DNA was isolated and DNA blots were made. 5 Briefly. 
10 jcg of high molecular weight (mol wt) DNA were digested using 
an appropriate restriction enzyme and electrophoresed on a 0.8% 
"agWbsc ^ The gel was stained with ethidium bromide, photo- 
graphed, denatured, neutralized, and transferred to Hybond (Amer- 
sham, Arlington Heights, 1L). After treatment of the filter with 
ultraviolet light, hybridization was performed. The filter was washed 
to a final stringency of 0.2% saturated sodium citrate (SSC) and 
0.1% sodium lauryly sulfate (SDS) and exposed to film. The human 
Jh probe has been previously reported* 

Genomic library. The genomic library was made using pub- 
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hshed methods. 3 Approximately 100 pg of high mol wt genomic 
DNA were partially digested with theSa«3A restriction enzyme 
Fragments from 9 to 23 kilobases (kb) in size were isolated on a 
sucrose gradient and ligated into phage BMBL3A (Stratcgene, San 
Diego). Recombinant phage were packaged, plated, and screened as 
previously reported. 5 

DNA sequencing. Fragments for sequencing were cloned into 
M13 vectors and sequenced by the chain termination method usina 
Sequenase (United States Biochemical, Cleveland). 7 All sequence 
data were derived from both strands. 

RESULTS 

We studied a bone marrow sample from a patient with 
ALL and associated peripheral eosinophilic Karyotypic 
analysis showed the characteristic t(5;14)(q31;q32) translo- 
cation. These features define a distinctive subtype of ALL. 3 ' 4 
The leukemic cells were analyzed for cell surface phenotype 
by immunofluorecence. They were positive for Bl (CD20) 
B4 (CD19). cALLA (CD10), HLA-DR, and terminal 
deoxynucleotidyi transferase (Tdt), but negative for surface 
immunoglobulin. This phenotypic profile describes an imma- 
ture cell from the B-Iymphocytic lineage. 0 

The leukemia DNA was analyzed by Southern blotting for 
rearrangements of the IgH gene. Using a human immuno- 
globulin Jh probe, a single rearranged band was detected by 
EcoRl. Hirtdm, Sstl, 5a«3A, and EcoKI plus Hindlli 
restriction digests, suggesting rearrangement of one allele 
(Fig 1). The immunoglobulin Jh region from the other allele 
was presumably either deleted or in the germline configura- 
tion. 

We hypothesized that the t(5;!4)(q31;q32) juxtaposed a 



growth-promoting gene on chromosome 5 with the immuno- 
globulin Jh region on chromosome 14. Therefore, a genomic 
library was made from the leukemic sample and screened 
with a Jh probe. Fifteen distinct positive clones were isolated 
and screened for the presence of the rearranged Sau3A 
fragment that was detected by DNA blotting. By this 
analysis, five clones appeared to represent the rearranged 
allele identified by DNA blots. One of these clones (clone no 
4) was chosen for further study and a detailed restriction 
map was generated. The EcoRl, Hindm/ EcoRl % and Sstl 
fragments from clone no. 4 that hybridized to the human Jh 
probe were also identical in size to the rearranged fragments 
from the leukemia sample,, confirming that clone no. 4 
represented the rearranged leukemic allele. 

Phage clone no. 4 contained 3.7 kb of unknown origin 
joined to the IgH gene in the region of Jh4 (Fig 2). The IgH 
gene from Jh4 to the Cmu region appeared to be in germline 
configuration. Previously, the gene encoding hematopoietic^ 
growth factor IL-3 had been mapped to chromosome 5q3 1 so 
it was suspected that clone no. 4 might contain part of this 
gene. When the restriction map of human IL-3 and clone 
no. 4 were compared, they were identical for more than 3 kb 
(rig 2). 

We confirmed the juxtaposition of the IL-3 gene and the 
IgH gene by nucleic acid sequencing of the subcloned 
BstEll/Hpal fragment (Fig 2). The sequence of this frag- 
ment showed no disruption of the protein coding region or the 
messenger RN A of the IL-3 gene. The break in the IL-3 gene 
occurred in the promotor region, 452 base pairs (bp) 
upstream of the transcriptional start site (position 64 Fig 
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3A>> The break in the IgH gene occurred 2 bp upstream of GM-CSF maps within 9 kb of IL-3 in the same transcrip- 

th& Jti4 region. Between the two breaks, 25 bp of uncertain tional orientation. 16 Using this information and assuming a 

origin (putative N sequence) were inserted 13 ' 14 No sequences simple translocation event in our sample, we can conclude 

homologous to the immunoglobulin heptamer and nonamer that the IL-3 gene is normally more centromeric, and the 

cou-li be identified in the IL-3 sequence (Fig 3B). Therefore, GM-CSF gene.more telomeric on chromosome 5q (Fig 4). 

nuoleic acid sequencing confirmed the juxtaposition of the Furthermore, both are transcribed with their 5' ends toward 

gene and the IgH gene. The sequence data clearly the centromere, 
shoved that the genes were positioned in opposite transcrip- 
tional orientations (head-to-head). DISCUSSION 

^Available data also allowed us to determine the normal In this report we have cloned a unique chromosomal 

positions of the IL-3 gene and the GM-CSF gene in relation translocation that appears to be a consistent feature of a rare, 

to "the centromere of chromosome 5 (Fig 4). The IgH gene is yet distinct, clinical form of acute leukemia. This transloca- 

known to be positioned with the variable regions toward the tion joined the promotor of the IL-3 gene to the IgH gene, 

telomere on chromosome 14q. W5 It has also been shown that Except for the altered promotor, the IL-3 gene appeared 

+ 

A 5 9 GGTGACC AGGGTTCCCTGGCCCCAGTAGTCAAAGTAGTAGAGGTAATTCATCATAGCTGCGGATTAGCAGCGTGACCGGC Q a 
3 ' CCACTGGTCCCAAGGGACCGGGGTCATCAGTTTCA BW 

• • • • 

5 1 TACCAGACAAACTCTCATCTGTTCGAGTGGCCTCCTGGCCACCCACCAGGACCAAGCAGGGCGGGCAGCAGAGGGCCAGG , fi n 
3 ■ ATGGTCTGTCTGAGAGTAGACAAGGTCACCGGAGGACCGGTGGGTGGTCCTGGTTCGTCCCGCCCGTCGTCTCCCGGTCC 

* ********* * * 

3 'CATCAGCTCCACTACCGTCTACTCT^ 240 

5 ■ GGGGTCCTCTCACCTGCTGCCATGCTTCCCATC^ 

3 ■ CCCCAGGAGAGTGGACGACGGTACGAAGGGTAGAGAGTAGGAGGAACTGTTCTACTTCACTATGGCAAATTCAOTAGAW 

„ . # ********* 

5 1 TTTCTTGTTTC ACTGATCTTGAGTACTAGAAAGTC ATGGATGAATAATTACGTCTCTGGTTTTCTATGGAGGTTCC ATGT Aaa 
3 1 AAAGAAC AAAGTGACTAGAACTCAIGATCTTTCAGTACCTACTTATTAATGCAGACACC 4 B a 

5 ' CAGATAAAGATCCTTCCGACGCCTGCCCCACACCACCACCTCCCCCCGCCTTGCCCGGGGTTC . „- 

3 • GTCTATTTCTAGGAAGGCTGCGGACGGGGTGTGGTGGTGGAGGGGGGCGGAACGGGCCCCAACACCCGTGGAACGACGAC 4 BU 

5 1 CACATaTftAGGCGGGAGGTTGTTGCCAACTCTTC AGAGCCCCACGAAGGACCAGAACAAGACAGAGTGCCTCCTGCCGAT q r 
3 1 GTGTATATTCCGCCCTCCAACAACGGTTGAGAAGTCTCGGGGTGCTTCCTGGTCTTCTTCTGTCTC 

5 1 CCAAACATOAGCCGCCTGCCCGTCCTGCTCCTGCTCCAACTC r . r 

3 ' GGTTTGTACTCGGCGGACG<K^AGGACGAGGACGAGGTTGAGGACCAGGCGGGGCCTGAGGTTCGAGGGTACTGGGTCTG 

5 1 AACGTCCTTGAAGACAAGCTGGGTTAAC 3' -- ft 
3 1 TTGCAGGAACTTCTGTTCGACCCAATTG 5' 0OB 

BTa Thd 5 ' TGGCCCCAGTAGTCAAAGTAGTCACATTGTGGGAGGCCCCATTAAGGGGTGCACAAAAACCTGACTCTC 
Agun 3 ' ACCGGGGTCATCAGTTTCATCAGTGTAACA CCCTCCGGGGTAATTCCCCACG TGTTTTTGG ACTGAGAG 

++++++++++++++++++++++ 

n * A 5 § TGGCCCCAGTAGTCAAAGTAGTAGAGGTAATTCATCATAGCTGCGGATTAGCAGCGTGACCGGCTACCA 
■ 3 1 ACCGGGGTC ATCAGTTTCATC ATC TCCATTAAGT AGTATCGACGCCTA ATCGTCGC ACTG GC CG ATG GT 
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5 1 GGCACCAAGAGATGTGCTTCTCAGAGCCTGAGGCTGAACGTGGATGTTTAGCAGCGTGACCGGCTACCA 
3 1 CCGTGGTTCTCTACACGAAGAGTCTCGGACTCCGACTTGCACCTACAAATCGTCGCACTGGCCGATGGT 



Fig 3. Sequence of t(6;14)(q31 ;q32) breakpoint region. (A) Nucleotide sequence of the Bst&\fHpe\ fragment Indicated on Fig 2. 
Nucleotides 1 to 36 represent the Jh4 coding region underlined on the coding strand. 9 Nucleotides 39 to 63 are a putative N region. The 
sequence from position 64 to 668 Is that of the gsrmline IL-3 gene. 2 * The IL-3 TATA box (485), transcription start (61 6), and initiation 
methionine (667) are underlined. Two proposed regulatory sequences in the promotor are marked by asterisks (positions 1 82 and 389). (B) 
Comparative sequence of the t(6;14)(q31;q32) breakpoint region. The lgJh4 region Is shown with Its coding region, heptamer, .and 
nonamer underlined. Clone no. 4 Is shown with putative N region sequences underlined. The IL-3 sequence is also shown. A plus sign (+ ) 
denotes the identical nucleotide between sequences. No heptamer or nonamer is identified in the IU3 sequence. 
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intact as no deletions, insertions, or point mutations were 
detected by restriction mapping of the entire gene and 
sequencing of part or the gene. The IgH gene has been 
ttuncated at the Jh4 region, which places the immunoglobu- 
lin enhancer within 2.5 kb of the IL-3 gene."-" This leads to 
the hypothesis that the enhancer is increasing transcription 
of a structurally normal IL-3 gene. The same mechanism is 

S nt /° r T iVat i, 0n ° f the 8 ene in «*» of 
22." l y ,n P homa -" An alternate hypothesis is that the 

SKS" °! m T?™ 1 " 1L " 3 promotor eIem «" * «ucial 
to the activation of the IL-3 gene. 

auSrine 0 ^? aCtiVati ° n ° f thc ^ 3 * cne "hggests that an 
hSZ U n ? ,S ' mp0rta,,t f ° r Pathogenesis of this 
leukemia. Over-expression of the IL-3 gene coupled with 
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t he presence of the IL-3 receptor in these cells could account 
for a strong stimulus for proliferation. In this regard, there 
are date indicating that immature B-lineage lymphocytes 
and B-hneage leukemias may express the IL-3 receptor ai - 22 

An additional feature of this type of leukemia is the 
dramatic eosinophilia, consisting of mature forms. It has 
been hypothesized that the eosinophils do not arise from the 
malignant clone, but are stimulated by the tumor ™ M 
Because of the known effect of IL-3 on eosinophil differentia- 
tion, secretion of high levels of IL-3 by leukemic cells might 
have a role in the eosinophilia in this type of leukemia 12 

The data suggest that the recombination mechanism that 
is active in the IgH gene during normal differentiation has a 
role in this translocation.' 3 -' 4 This is supported by the break- 
point location at the 5' end of Jh4 and the presence of 
putative N-region sequences. On the other hand, no recombi- 
nation signal sequence (heptamer and nonamer) was found 
in this region on chromosome 5, suggesting that additional 
factors also played a role. Further studies will elucidate the 
mechanism of this and other translocations. 

In the leukemia we studied, it is possible that the immuno- 
globulin enhancer also activates the GM-CSF gene, since 
this gene is probably positioned only 14 kb away (Fig 4). This 
is known to be within the range of enhancer activation. 2 * The 
raterleukin-3 (IL-5) gene maps to chromosome 5q3l » 
Deregulation of the IL-5 gene by this translocation would act 
synergistically with IL-3 in the stimulation of eosinophil 
proliferation and differentiation. 27 These and other questions 
will be answered by the study of more patient samples We 
plan to determine whether the t(5;14)(q31;q32) transloca- 
tion is capable of activating multiple lymphokines simulta- 
neously and whether they cooperate in the generation of this 
leukemia. 
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RAPID COMMUNICATION 

Activation of the Interleukin-3 Gene by Chromosome Translocation in Acute 
Lymphocytic Leukemia With Eosinophil 

By Timothy C. Meeker, Dan Hardy, Cheryl Willman, Thomas Hogan, and John Abrams 



The t(5;14)(q31;q32) translocation from EMirteage acute 
lymphocytic leukemia with eosinophilia has been cloned 
from two leukemia samples. In both, cases, this transloca- 
tion Joined the fgH gene and the interleukhv3 llL-3) gene. In 
one patient, excess IL-3 mRNA was produced by the 
leukemic cells. In the second patient, serum IL-3 levels 
were measured and shown to correlate with disease 

A NUMBER OF chromosome translocations have been 
associated with human leukemia and lymphoma. In 
many cases the study of these translocations has led to the 
discovery or characterization of proto-oncogenes, such as 
bcl-2, cnibl, and om;c, that are -located adjacent to the 
translocation, 1 - 2 It is now widely understood that cancer- 
associated translocations disrupt nearby proto-oncogenes. 

A distinct subtype of acute leukemia is characterized by 
the triad of B-lineage immunophenotype, eosinophilia, and 
the t(5;14)(q31;q32) translocation. 3 * 4 Leukemic cells from 
such patients have been positive for terminal deoxynucleotidyl 
transferase (Tdt), common acute lymphoblastic leukemia 
antigen (CALLA), and CD19, but negative for surface or 
cytoplasmic immunoglobulin. In previous work, we cloned 
the t(5;14) breakpoint from one leukemic sample (Case I) 
and deterrnined that the IgH and interleukin-3 (IL-3) genes 
were joined by this abnormality. 5 In this report, we extend 
those findings by showing that the t(5;14)(q31;q32) translo- 
cation from a second leukemia sample (Case 2) has a similar 
structure, and we report our study of growth factor expres- 
sion in these patients. 

MATERIALS AND METHODS 
Samples and Southern blots. Case 1 has been described. 5 -* 
Clinical features of Case 2 have been described in detail DNA 
isolation and Southern blotting was done using previously described 
methods. 5 Filters were hybridized with an immunoglobulin Jh probe, 
a 280 bp BamHl/EcoKl genomic IL-3 fragment, and an IL-3 
cDN A probe. 7 - 1 

Northern blots. RNA isolation and Northern blotting have been 
described. Briefly, Northern blots were done by separating 9/ig 
total RNA on i% agarose-formaldchyde gels. Equal RNA loading in 
each lane was confirmed by ethidium bromide staining. Blots were 
hybridized with an IL-3 cDNA probe extending to thtXho 1 site in 
exon 5, a 720 bp Sst l/Kpn I probe derived from intron 2 of the IL-3 
gene, a 600 bp Nhe l/Hpa I IL-5 cDNA probe, and a 500 bp Pit 
l/Nco I granulocyte-macrophage colony stimulating factor (GM- 
CSF)cDNA probe.'™ K 

Polymerase chain reaction. Primers were designed with BamHl 
sites for cloning. One primer hybridized to the Jh sequences from the 
IgH gene (Primer 1 44:5'-TAGG ATCCG ACGGTG ACCAGGGT) 
and the other hybridized to the region of the TATA box in the IL-3 
gene (Primer 161: 5-AACAGGATCCCGCCTTATATGTGCAG) 
Polymerase chain reaction (PCR) (95°C for I minute, 6fC for 30 
seconds, and 72°C for 3 minutes) was done using 500 ng genomic 
DNA and 50 pmol of each primer in 100 fiL containing 67 mmol/L 
Tns-HCl pH 8.8, 6.7 mmol/L Mgd* 10% dimethyl sulfoxide 
(DMSO), 170 Ag/mL bovine serum albumin (BSA) (fraction V), 
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activity. There was no evidence of excess granulocyte/ 
macrophage colony stimulating factor (GM-CSF) or IL-B 
expression. Our data support the formulation that this 
subtype of leukemia may arise in part because of a 
chromosome translocation that activates the IU3 gene, 
resulting in autocrine and paracrine growth effects. 
© 1 990 by The American Society of Hematology. 

16.6 mmol/L ammonium sulfate, 1.5 mmol/L each dNTP and Taq 
polymerase (Perkin-Elmer, Norwalk, CT).» 

Sequencing. Sequencing was done by chain termination in Ml 3 
vectors. 14 As part of this study, we sequenced a subclone of a normal 
IL-3 promoter, covering 598 base pairs from a Sma I site at position 
~ 1240 < wlth "spect to the proposed sitFof transcription initiation) 
toanjVfel site at position -642. The plasmid containing this region 
was a gift from Naoko Arai of the DNAX Research Institute. 

Expression in Cos? cells. A genomic IL-3 fragment from Case 1 
was cloned into the pXM expression vector. 10 Briefly, the ffinMl/ 
Sal I fragment containing the IL-3 gene was subcloned from the 
previously described phage clone 4 into pUC18. 5 The 2.6 kb 
fragment extending from the Sma I site 61 bp upstream of the IL-3 
transcription start to the Sma I site in the polylinfcer was cloned into 
the blunted Xho I site of pXM. The negative control construct was 
the pXM vector without insert Piasmids were introduced into Cos7 
cells by electroporauon, and supernatant was collected after 48 
hours in culture. 

TFI bioassay. TF-! cells were passaged in RPMI 1640 supple- 
mented with 10% heat-inactivated fetal bovine serum, 2 mmol 
L^gluUniine,and 1 ng/mL human GM-CSF. 15 Samples and antibod- 
ies were diluted in this same medium lacking GM-CSF but contain- 
ing pemciliin and streptomycin. A 25 fiL volume of serial dilutions of 
patient serum was added to wells in a flat bottom 96-wcIl microliter 
plate. Rat anti-cytokine monoclonal antibody in a volume of 25 ph 
was added to appropriate wells and preincubated for 1 hour at 37°C 
Fifty microliters of twice washed TF-1 cells were added to each well 
gmog a final cell concentration of I x . 10 4 cells per well (final 
volume, 100 /*L). The plate was incubated for 48 hours. The 
remaining cell viability was determined metabolically by the colori- 
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Rg 1. Breakpoint sequences for Cade 2. The germirne fgJhS region sequence (protein cndfcm M v> 
seque^es are undarlined) b on top, the translocation sequence from Case 2 [PCH primer ™ recombination aignaj 

« In the middle, and the germttne IL-3 sequence, which we derived from a nomid^ ? re ?° n « «n**rfined> 

sequence has the sams nucleotide. The sequence documents the head-t^^ 

flene occurred at position -934 {•). 8 3 and W The breakpoint in the IL-3 

metric method of Mosmann using a VMax raicrotiter plate reader 
(Molecular Devices, Menlo Park, CA) set at 570 and 650 nm. 16 

Cytokine immunoassays. These assays used rat monoclonal 
anti-cytokroe antibodies (10 ftg/mL) to coat the welb of a PVC 
microliter plate. The capture, antibodies used were BVD3-6G8, 
JES1-39D10, and BVD2-23B6, for the 1L-5, and GM-CSF 
assays, respectively. Patient sera were then added (undiluted and 
diluted 13 for 11^3, undiluted for IL-5, and undiluted and diluted 
1:5 for GM-CSF). The detecting immunoreagents used were either 
mouse antiserum tolL-3 or lutioiodophenyl (NIP)-dcrivatized rat 
monoclonal antibodies JES1-5A2 and BVD2-21C11, specific for 
11^5 and GM-CSF, respectively. Bound antibody was subsequently 
detected with immunoperoxidase conjugates: horseradish peroxidase 
(HRP)-labeled goat anti-mouse Ig for IL-3, or HRP-Iabeled rat (J4 
MoAb) anti-NIP for 11^5 and GM-CSF, The cbxomogenic sub- 
strate was 3-3'a»no-bis-ben2thia20line sulfonate (ABTS; Sigma, St 
Louis, MO). Unknown values were interpolated from standard 
curves prepared from dilutions of the recombinant factors using 
Softmax software available with the VMAX microplate reader 
(Molecular Devices). 

RESULTS 

Leukemic DNA from Case 2 was studied by Southern 
blotting. When digested with the HindlU restriction enzyme 
and hybridized with a human immunoglobulin heavy chain 
joining region (Jh) probe, a rearranged fragment at approxi- 
> mately 14 kb was detected (data not shown). When reprobed 
with either of two different IL-3 probes,* rearranged 14 kb 



5 afi ^ Cn > ^ lgratb8 *** "arranged Jh fragment, was 
iden^ When leukemic DNA was digested wittSS 
Plus EcoKl, a rearranged Jh fragment was detected at™ 
The IL-3 probes also identified a emigrating fragment of 

zxss rsrrr .-ar Es 

To characterize better the joining of the IL-3 gene and the 
immunoglobulin heavy chain (IgH) gene, the polymerase 
charn reaction (P<*) was used to clone the tnJSXS? 

In, - P fi T er ^ d ^ ^ 3 Primer Were desi 8"«« "> Produced 
ampMed product k the event of a head-to-head transS^ 

°M^ M t^ 1DNA 8 av * no ^Product,Q^S?Sl 
yielded a PCR-derived fragment of approximately 980 bp 
which was cloned and sequenced. ^ 

J?' 1 ^*^ 06 ° f ^^tion clone from Case 2 
confirm* the jo ning of the Jh region with the promoSTof 
the IL-3 gene in a head-to^head configuration (H B lT 
Sequence analysis indicated that the breakpoint on ch?om£ 
some 14 was just upstream of the Jh5 coding region. The 
breakpoint on chromosome 5 occurred 934 bp upTtr^mof 
fte putative s,te of transcription initiation of L iLTgenc 
We abo determined Jhat a putative N sequence of n bp was" 
inserted between tSTchromosome 5 Tnd chromosome^ 
sequences during the translocation event." " Figure 2 shows 
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<dn ^ 9 ' a ^ w,s ' , 'Pet chromosome 5 breakpoints to the IL-3 gene. This figure shows the two cloned bresknntnto i 
ILC!! 9 T: ° M br ~ k P°"« at posHlon -462 and the other at ^ Tro^ to ^lZ * " "* n, " n t0 

franslocatlons resulted m a head-tc-head Joining of the IgH gene and the IL-3 gene, leaving the n*U a £ wotil ^ ^^""stancos. the 
gene Intact. Boxes denote thefive IL-3 axons: restrfctlcn enzymes are (B) fiamHl, (P) Pstt, |H) ISu^K^Sm^ 
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Relocations of the two cloned breakpoints in relation to the 
JL-3 gene. The two chromosome 5 breakpoints were sepa- 
rated by less than 500 bp. 

The genomic structure in Cases 1 and 2 suggested that a 
Mormal IL-3 gene product was over-expressed as a result of 
the altered promoter structure. This would predict that the 
IL-3 gene on the translocated chromosome was capable of 
Kiaking IL-3 protein. This prediction was tested by express- 
ing a genomic fragment from the translocated allele of Case 
1 containing all five IL-3 exons under the control of theSV40 
promoter/enhancer in the Cos7 cell line. Cell supernatants 
vrere studied in a proliferation assay using the factor depen- 
dent erythroleukemic ceil line, TF-1. The supernatants' 
derived from transfections using the vector plus insert 
supported TF-l proliferation, while supernatants from trans- 
fections using the vector alone were negative in this assay 
<data not shown). Furthermore, the biologic activity could be 
blocked by an antibody to human IL-3 (BVD3-6*G8). This 
result showed that the translocated allele retained the ability 
to make IL-3 mRNA and protein. 

The level of expression of IL-3 mRNA in leukemic cells 
from Case 1 was assessed. Northern blotting showed that the 
mature IL-3 mRNA (approximately 1 kb) and a 2.9 kb 
unspliced IL-3 mRNA were excessively produced by the 
leukemia (Fig 3). The 2.9 kb form of the mRNA is also 
present at low levels in normal peripheral blood T lympho- 
cytes after mitogen activation (Fig 3). Several B-lineage 
acute leukemia samples without the t(5;14) translocation 
had undetectable levels of IL-3 mRNA in these experiments. 
In addition, although genes for GM-CSF and IL-5 map close 
to the IL-3 gene and might have been deregulated by the 
translocation, no IL-5 or GM-CSF mRNA could be detected 
in the leukemic sample (data not shown). 19 * 20 

Three serum samples from Case 2 were assayed by 
immunoassay for levels of IL-3, GM-CSF, and IL-5 (Table 
1). Serum IL-3 could be detected and correlated with the 
clinical course. When the patient's leukemic cell burden was 



r 

highest, the IL-3 level was highest. No serum GM-CSF or 
IL-5 could be detected. 

Since the IL-3 immunoassay measured only immunoreac- 
tive factor, we confimed that biologically active IL-3 was 
present by using the TF-1 bioassay. This bioassay can be 
rendered monospecific using appropriate neutralizing mono- 
clonal antibodies specific for IL-3, IL-5, or GM-CSF. We 
observed that sera from 1-16-84 and 3-14-84 contained TF-1 
stimulating activity that could be blocked with anti-IL-3 
MoAb (BVD3-6G8), but not with MoAbs to IL-5 (JESI- 
39D10) or GM-CSF (BVD2-23B6) (Fig 4; GM-CSF data 
not shown). The amount of neutralizable bioactivity in these 
two samples correlated very well with the difference in IL-3 
levels obtained by immunoassay for these samples. Further- 
more, the failure to block TF-1 proliferating activity with 
either antRL-5 or anti-GM^CSF^was consistent with the 
inability to measure these factors by Ymmunoassay'.and 



Table 1 . Peripheral Blood Counts and Growth Factor Levels 
at Different Times hi Case 2 



Peripheral blood counts (celte/uU 
WBC 

tymphoblasts 
Eosinophils 
Serum growth factor levels (pg/ml) 
0.-3 

GM-CSF 
IL-5 

Peripheral blood counts from Case 2 at three different time points with 
the corresponding growth factor levels quantified by Immunoassay. The 
patient received chemotherapy between 1/16/84 and3/ 14/84 to lower 
his leukemic burden. 3 No serum samples were available for a simitar 
analysis of Case 1. 

Abbreviation: WBC, white blood cells. 
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Clinical and Pathologic Significance of the 
oerbB-2 (HER*2Jneu) Oncogene 

Timothy R Singleton and John G. Strickler 



The c-erbB-2 oncogene was first shown to have clinical significance in 1987 by 
Slamon et al,™ who reported that oerbB-Z DNA amplification in breast carcino- 
mas correlated with decreased survival in patients with metastasis to axillary 
lymph nodes. Subsequent studies, however, of c-eriB-2 actrvaUon in breast 
carcinoma reached conflicting conclusions about its clinical significance. This 
oncogene also has been reported to have clinical and pathologic implications in 
other neoplasms. Our review summarizes these various studies and examines 
the clinical relevance of c-er6B-2 activation, which has not been emphasized in 
recent reviews.^* The molecular biology of the c-eriB-2 oncogene has been 
extensively reviewed 37 .* 9 - 5 * and will be discussed only briefly here. 

7 

BACKGROUND 

The c-erfcB-2 oncogene was discovered in the 1980s by three lines of investiga- 
1 tion. The neu oncogene was detected as a mutated transforming gene in 

J neuroblastomas induced by ethylnitrosurea treatment of fetal rats.WHw The c- 

| eriB-2 was a human gene discovered by its homology to the retroviral gene v- 

? erbB.^T* HER-2 was isolated by screening a human genomic DNA library for 

« homology with v-eriB " When the DNA sequences were determined subse- 

quently, c-eriB-2, HER-2> and neu were found to represent the same gene. 
Recently, the c-erfcB-2 oncogene also has been referred to as NGL. 
■J The c-eriB-2 DNA is located on human chromosome 17q21«^ M and codes 

for c-erbB«2 mRNA (4,6 kb), which translates c-erfrB-2 protein (pl85). This 
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protein Is a normal component of cytoplasmic membranes. The c-eriB-2 
oncogene is homologous with, but not identical to t c-erfcB-1, which is located 
on chromosome 7 and codes for the epidermal growth factor rcccptorA lw Thc c- 
erbB-2 protein is a receptor on cell membranes and has intracellular tyrosine 
kinase activity and an extracellular binding domain, 8 - 108 Electron microscopy 
with a polyclonal antibody detects o-er&B-2 immunoreactivity on cytoplasmic 
membranes of neoplasms, especially on microvilli and the non-villous outer cell 
membrane. 61 In normal cells, immunohistochemical reactivity for e-erfeB-2 is 
frequently present at the basolateral membrane or the cytoplasmic membrane s 
brush border. °> a 

There is experimental evidence that c^crfcB-2 protein may be involved in ' 
the pathogenesis of breast neoplasia. Overproduction of otherwise normal c- 
ertB-2 protein can transform a cell line into a malignant phenotype.* 5 Also, 
when the neu oncogene containlh^an activating point mutation' is placed in 
transgenic mice with a strong promoter for increased expression, the mice 
develop multiple independent mammary adenocarcinomas. In other experi- 
ments, monoclonal antibodies against the neu protein inhibit the growth (in 
nude mice) of a new-transformed cell line, 26 -* 8 and.immunization of mice with 
neu protein protects them from subsequent tumor challenge with the neu- 
transformed cell line." Some authors have speculated that the use of antago- 
nists for the unknown ligand could be useful in future chemotherapy. » Further 
review of this experimental evidence is beyond the scope of this article. 

The c-erfcB-2 activation most likely occurs at an early stage of neoplastic 
development. This hypothesis is supported by the presence of oer&B-2 activa- 
tion m both in situ and invasive breast carcinomas. In addition, studies of 
metastatic breast carcinomas usually demonstrate uniform c-eriB-2 activation 
at multiple sites in the same patient, MUMM although c-erfcB-2 activation has 
rarely been detected in metastatic lesions but not in the primary tumor.* 7 - 00 ' 107 
Even more rarely, c-er&B-2 DNA amplification has been detected in a primary 
breast carcinoma but not in its lymph node metastasis. 5 In patients who have 
bilateral breast neoplasms, both lesions have similar patterns of c-erfcB-2 activa- 
tion, but only a few such cases have been studied^ 1 



MECHANISMS OF c-erdB-2 ACTIVATION 

The most common mechanism of c-erfcB-2 activation is genomic DNA amplifica- 
tion, which almost always results in overproduction of c-eriB-2 mRNA and 
protein. lT ' a4 - 65 . B1 The c-er&B-2 amplification may stabilize the overproduction of 
mRNA or protein through unknown mechanisms. Human breast carcinomas 
with amplification contain 2 to 40 times more c-er&B-2 DNA 4 ' 3 and 4 to 

128 times more c-*rZ>B-2 mRNA 0 ** 80 than found in normal tissue. Most human 
breast carcinomas with c-erfcB-2 amplification have 2 to 15 times more c-erfcB-2 
DNA. T\iraors with greater amplification tend to have greater overproduc- 
tion. 17 ^- 65 The non-mammary neoplasms that have been studied tend to have 
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similar levels of c-er&B-2 amplification or overproduction relative to the corre- 
sponding normal tissue. 

The second most common mechanism of oertB-2 activation is overproduc- 
tion of c-eriB-2 mRNA and protein without amplification of c-erfcB-2 DNA. 91 
The quantities of mRNA and protein usually are less than those in amplified 
cases and may approach the small quantities present in normal breast or other 
tissues. l7 »».» The c-er&B^2 protein overproduction without mRNA overproduc- 
tion or DNA amplification has been described in a few human breast carcinoma 
cell lines. 

Other rare mechanisms of c-erbB-2 activation have been reported. Translo- 
cations involving the c-erbB-2 gene have been described in a few mammary and 
gastric carcinomas, although some reported cases may represent restriction 
fragment length polymorphisms or incomplete restriction enzyme digestions 
that mimic translocations^^^HBcios A single point mutation in the transmem- 
brane portion of neu has been described in rat neuroblastomas induced by 
ethylnitrosurea>B The mutated neu protein has increased tyrosine kinase activ- 
ity and aggregates at the cell membrane. Although there has been specula- 
tion that some of the amplified c-eriB-2 genes may contain point mutations, 48 
none has been detected in primary human neoplasms. <1 *s3,bi 



TECHNIQUES FOft DETECTING C-eribB-2 ACTIVATION 
Detection of c-erbB~2 DNA Ampllf icatton 

Amplification of CrerbB-2 DNA is usually detected by DNA dot blot or South- 
ern blot hybridization. In the dot blot method, the extracted DNA is placed 
directly on a nylon membrane and hybridized with a c-erfcB-2 DNA probe. In 
the Southern blot method, the extracted DNA is treated with a restriction 
enzyme, and the fragments are separated by electrophoresis, transferred to a 
nylon membrane, and hybridized with a c-erfcB-2 DNA probe. In both tech- 
niques, oerfcB-2 amplification is quantified by comparing the intensity (mea- 
sured by densitometry) of the hybridization bands from the sample with those 
from control tissue. 

Several technical problems may complicate the measurement of c-erfcB-2 
DNA amplification. First, the extracted tumor DNA may be excessively de- 
graded or diluted by DNA from stromal cells*" Second, the cx?ri>B-2 DNA 
probe must be carefully chosen and labeled. For example, oligonucleotide e- 
erbB-2 probes may not be sensitive enough for measuring a low level of c-eriB- 
2 amplification, because diploid copy numbers can be difficult to detect (unpub- 
lished data). Third, the total amounts of DNA in the sample and control tissue 
must be compensated for, often with a probe to an unamplified gene. Many 
studies have used control probes to genes on chromosome 17, the location of c- 
cr6B-2, to correct for possible alterations in chromosome number. Identical 
results, however, are obtained by using control probes to genes on other chro- 
mosomes, 5 •« with rare exception. ,r Studies using control probes to the beta- 
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globin gene must be interpreted with caution, because one allele of this gene is 
deleted occasionally in breast carcinomas. 3 

Amplification of c-erfeB-2 DNA was assessed by using the polymerase 
chain reaction (PCR) in one recent study. 32 Oligoprimers for the o-er*B-2 gene 
and a control gene are added to the samples DNA, and PCR is performed. If 
the sample contains more copies of c-erbB-2 DNA than of the control gene, the 
c-erbB-2 DNA is replicated preferentially. 

Detection of c-enbB-2 mRNA Overproduction 

Overproduction of c-erAB-2 mRNA usually is measured by RNA dot blot or 
Northern blot hybridization, ^oth techniques require extraction of RNA but 
otherwise are analogous to DNA dot blot and Southern Wot hybridization. Use 
of PCR for detection of c~*r£B-2 mRNA hasrbeen described in two recent 
abstracts. 88 * 102 

Overproduction of c-eriB-2 mRNA can be measured by in situ hybridiza- 
tion. Sections are mounted on glass slides, treated with protease, hybridized 
with a radiolabeled probe, washed, treated with nuclease to remove unbound 
probe, and developed for autoradiography. Silver grains are seen only over 
tumor cells that overproduce oer&B-2 mRNA. Negative control probes are 
used ».w,iob Our experience indicates that these techniques are relatively insensi- 
tive for detecting c-erbB-2 mRNA overproduction in routinely processed tisr 
sue. Although the sensitivity may be increased by modifications that allow 
simultaneous detection of e-erfcB-2 DNA and mRNA, in situ hybridization still 
is cumbersome and expensive (unpublished data). 

All of the above oerf>B-2 mRNA detection techniques have several prob- 
lems that make them more difficult to perform than techniques for detecting 
DNA amplification. One major problem is the rapid degradation of RNA in 
tissue that is not immediately frozen or fixed. In addition, during the detection 
procedure, RNA can be degraded by RNase; a ubiquitous enzyme, which must 
be eliminated meticulously from laboratory solutions. Third, control probes to 
genes that are uniformly expressed in the tissue of interest need to be carefully 
selected. T - T 

Detection of oeri>B-2 Protein Overproduction 

The most accurate methods for detecting c-erAB-2 protein overproduction are 
the Western blot method and immunoprecipitation. Both techniques can docu- 
ment the binding specificity of various antibodies against c-erAB-2 protein. In 
Western blot studies, protein is extracted from the tissue, separated by electro- 
phoresis (according to size), transferred to a membrane, and detected by using an- 
tibodies to c-^rfcB-2. In immunoprecipitation studies, antibodies against c-er&B- 
2 are added to a tumor lysate, and the resulting protein-antibody precipitate is 
separated by gel electrophoresis and stained for protein. Both Western blot and 
immunoprecipitation are useful research tools but currently are not practical for 
diagnostic pathology. IWo recent abstracts have described an enzyme-linked 
immunosorbent assay (EUSA) for detection of c-sriB-2 protein. 
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ACTIVATION OF c-erf>B-2 IN BREAST LESIONS 
Incidence of c-erbB-2 Activation 

Most studies of c-erfcB-2 oncogene activation do not specify histological sub- 
types of infiltrating breast carcinoma. Amplification of c-erbB-2 DNA was found 
in 19-1 percent (519 of 2715) of invasive carcinomas in 25 studies (Iable 1), and 
c-er&B-2 mRNA or protein overproduction was detected in 20.9 percent (566 of 
2714) of invasive carcinomas in 20 studies. Twelve studies have documented c- 
fcr*B-2 mRNA or protein overproduction in 15 percent (88 of 604) of carcinomas 
that lacked c-eriB-2 DNA amplification. 

The incidence of c-erbB+2 activation in infiltrating breast carcinoma varies 
with the histological subtype. Approximately 22 percent (142 of 650) of infiltrat- 
ing ductal carcinomas have c-er&B-2 activation, as expected from the above 
data. Other variants of breast carcinoma with frequent c-erfeB-2 activation are 
inflammatory carcinoma (62 percent, 54 of 87), Paget s disease (82 percent, 9 of 
11), and medullary carcinoma (22 percent, 5 of 23), In contrast, c-erfcB-2 activa- 
tion is infrequent in infiltrating lobular carcinoma (7 percent, 5 of 73) and 
tubular carcinoma (7 percent, 1 of 15). 

The c-erbB-2 protein overproduction is present in 44 percent (44 of 100) of 
ductal carcinomas in situ and especially comedocarcinoma in situ (68 percent, 
49 of 72). The micropapillary type of ductal carcinoma in situ also tends to have 
c-crfeB-2 activation^' 54 - 68 especially if larger cells are present. The greater fre- 
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Overproduction of c-eriB-2 protein is most commonly assessed by various 
immunohistochemica] techniques. These procedures often generate conflicting 
results, which are explained at least partially by three factors. First, various 
studies have used- different polyclonal and monoclonal antibodies. Because 
some polyclonal antibodies recognize weak bands in addition to the c-erfcB-2 
protein band on Western blot or immunoprecipitation, the results of these 
studies should be interpreted with caution. 22 - 3 * 47 - 61 Even some monoclonal anti- 
bodies immunoprecipitate protein bands in addition to c-erfcB-2 (P185), 30 ' 59 * 88 
Second, tissue fixation contributes to variability between studies. Fbr example, 
some antibodies detect oerb&-2 protein only in frozen tissue and do not' react 
in fixed tissue. In general, formalin fixation diminishes the sensitivity of 
immunohistochemical methods and decreases the number of reactive ceUs. 6 *- 85 
When Bouin's fixative is used, there roay.be a In^Lpercentage of positive 
cases,** Third, minimal criteria for interpreting immunohistochemical staining 
are generally lacking. Although there is general agreement that distinct crisp 
cytoplasmic membrane staining is diagnostic for c-erfeB-2 activation in breast 
carcinoma, the number of positive cells and the staining Intensity required to 
diagnose c-enbB-2 protein overproduction varies from study to study and from 
antibody to antibody. Degradation of c-erfeB-2 protein is not a problem because 
it can be detected in intact form more than 24 hours after tumor resection . 
without fixation or freezing. 04 \ 
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quency of c-erbB-2 protein overproduction In comedocardnorna in situ, com- 
pared with infiltrating ductal carcinoma, could be explained by the fact that 
many infiltrating ductal carcinomas arise from other types erf intraductal carci- 
noma, which show c-eriB-2 activation infrequently Others have speculated 
that carcinoma in situ with oerfcB-2 activation tends to regress or to lose c- 
erbB-2 activation during progression to invasion. 4 * 68 .* Infiltrating and in situ 
components of ductal carcinoma, however, usually are similar with respect to o- 
«r&B~2 activation although some authors have noted more heterogeneity of 
the immunohistochemical staining pattern in invasive than in in situ carci- 
noma * Activation of c-eriB-2 is infrequent in lobular carcinoma in situ. If 
lesions contain more than one histological pattern of carcinoma in situ, the c- 
erfcB-2 protein overproduction tends to occur in the comedocarcinoma in situ 
but may include other areas of carcinoma in situ.**** Overproduction of c- 
er&B-2 protein in ductal carcinoma in situ correlates with larger cell size and a 
periductal lymphoid infiltrate,** 

Activation of c-er&B-2 has not been identified hi benign breast lesions, 
including fibrocystic disease, fibroadenomas, and radial scars CRible 2). Strong 
membrane immunohistochemical reactivity for o-erfcB-2 has not been described 
in atypical ductal hyperplasia, although weak accentuation of membrane staining 
has been noted infrequently.®^ In normal breast tissue, c-er&B-2 DNA is 
diploid, and c-er&B-2 is expressed at lower levels than in activated tumors. 3 *- 35 * 6 * 88 

These prcdirninary data suggest that c-erbB-2 activation may not be useful 
for resolving many of die common problems in diagnostic surgical pathology For 
example, oerfeB-2 activation is infrequent in tubular carcinoma and radial scars. 
In addition, because c-crfrB-2 activation is unusual in atypical ductal hyperplasia, 
cribriform carcinoma in situ, and papillary carcinoma in situ, detection of c-er&B- 
2 activation in these lesions may not be helpful in their differential diagnosis. The 
histological features of comedocarcinoma in situ, which commonly overproduces 
oer&B~2, are unlikely to be mistaken for those of benign lesions. Activation of 
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oer&B~2, however, does fevor infiltrating ductal carcinoma over infiltrating 
lobular carcinoma. Further studies of these issues would be useful. 

Correlation of c-ertB-2 Activation With Pathologic Prognostic Factors 

Multiple studies have attempted to correlate c-erfcB-2 activation with various 
pathologic prognostic fectors (Table 3). Activation of c-erfcB r 2 was correlated 
with lymph node metastasis in 8 of 28 series, with higher histological grade in 6 
of 17 series, and with higher stage in 4 of 14 series. Large tumor size was riot 
associated with c-erfcB-2 activation in most studies (11 of 14). Tetraploid DNA 
content and low proliferation, measured by Ki-67, have been suggested as 
prognostic fectors and may correlate with c-er&B-2 activation.* 7 

Correlation of c-erbB-2 Activation With Clinical Prognostic Factors 

Various studies have attempted also to correlate c-erfcB-2 activation with clinical 
features that may predict a poor fcutcome (Table 4). Activation of c-er&B-2 
correlated with absence of estrogen receptors in 10 of 28 series and with ab- 
sence of progesterone receptorsin 6 of 18 series. In most studies, patient age 
did not correlate with c-eriB-2 activation, and, in the rest of the reports, c- 
erhB-2 activation was associated with either younger or older ages. 

Correlation of c-erJbB-2 Activation With Patient Outcome 

Slamon et aF*« first showed that amplification of the c-er&B-2 oncogene inde- 
pendently predicts decreased survival of patients with breast carcinoma. The 
correlation of c-erfeB-2 amplification with poor outcome was nearly as strong as 
the correlation of number of involved lyniph nodes with poor outcome. Slamon 
et al also reported that c-erfrB-2 amplification is an important prognostic indica- 
tor only in patients with lymph node metastasis. 70 - 81 

A large number of subsequent studies also attempted to correlate c-erAB-2 
activation with prognosis (Ikble 5). In 12 series, there was a correlation be- 
tween c-erbB-2 activation and tumor recurrence or decreased survival. In five 
of these series, the predictive value of c-erfcB-2 activation was reported to be 
independent of other prognostic factor&r-In contrast, 18 series did notconfirm 
the correlation of c-eriB-2 activation with recurrence or survival. Four possible 
explanations for this controversy are discussed below. 

One problem Is that c-erf>B-2 amplification correlates with prognosis 
mainly in patients with lymph node metastasis. As summarized in Table 5, most 
studies of patients with axillary lymph node metastasis showed a correlation of 
e-erfcB-2 activation with poor outcome. In contrast, most studies of patients 
without axillary metastasis have not demonstrated a correlation with patient 
outcome. Table 6 summarizes the studies in which all patients (with and with- 
out axillary metastasis) were considered as one group. There is a trend for 
studies with a higher percentage of metastatic cases to show an association 
between c-erfcB-2 activation and poor outcome. Thus, most of the current 
evidence suggests that c-erfcB-2 activation has prognostic value only in patients 
with metastasis to lymph nodes. 
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^HL COflRELATI0N 0P C ^ 6M ACTIVATION WITH OUTCOME IN PATIENTS 
WITH BREAST CARCINOMA 
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P* 


o-erfcB-2 
Activation 6 


Total 


Axillary No 
Lymph Nodes Metastasis 


Statistical 

Analysis 11 


Reference 




<0.05 


DNA 


1 /O 




M 


67 . 




<o.os 


DMA 


fi1 




U 


60 


: 


<D.05 


DMA 


Of 




U 


65 
93 




<0.05 


DNA 


A'i 




U 




<0.05 


mRNA 


fiP 




u 


65 




<0.05 


Protein 






M u - • 


10V 




<0.05 


DNA 




AAA 


M 


61 




<0.05 


DNA 




ion 


U 


17 




<0.05 


DNA 






u 


87 




<0.O5 


DNA 




AR 

OH » 


M 


79. 




<0.05 


Protein- WB 




350 


- M 






<0.05 


Protein 




62 44 


u 


101 




0.05-0.15 


DNA 


67 




u 


111 




0.05-0,15 


Protein 


169 




M 


92 




0.0&-0.15 


Protein 




120 


U 


86 




>0.15 


DNA 


130 




U 


113 




>0.15 


DNA 


122 




M 


4 




>0.1S 


DNA 


50 




U 


44 




>0.16 


mRNA 


57 




u 


50 




>0.15 


Protein 


280 




M 


66 




>0.15 


Protein 


195 




U 


11 
39 




>0.15 


Protein 


102 




U 


'i • 


>0.15 


Protein 




137 


U 


17 




>0,15 


DNA 




1B1 


M 


61 




>0.15 


DNA 




- - 159 ■ 


U 


ft 




>0.15 


DNA 




73 


u 


87 




>0.15 


Proteln-WB 




378 


u 


65 




>0.15 


Proteln-WB 




192 


u 


17 




>0,15 


Protein 




141 


u 


86 


i 


>0.15 


Protein 




41 


u 


40 



*The endpolnta of these studies were tumor recurrence or decreased survive) or both. Correlation between > 
ert>B-2 activation and a poorer patient outcome to statistically significant at <0.05, Is ot equivocal significance 
at 0.05 to 0.16, and la not significant at >0'.16. 

*Shown as variable measured. Letter* "WB" indicate assay by Western blot; the other protein etudes used 
fmmunchistochemlcal methods. 

C M c» multivariate statistical analysis; U = univariate statistical anaJyafs. 
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% of tumors with 
lymph node 
metastasis In 
each study 



70 ^ 



80- 



50- 



40- 



71 (OHA)* 



61 (DMA) 80 

69(DNA)<* 
58(Proteln)"» 



64{DNA)» 



42(P/otefo)* 



P<Oj05 



0,05<P<0.15 



64 (mRNA) a 
61{DNA)< 



57{DNA)"» 

sstProtsiny 3 * 

48(Prote?n)i» 
46(Proiein)» 



H 

p>ais 



P for correlatton of c-erf>B* activation vM pattern outcome. 

^^S! 83 ! ° f bre 2? ra "* w < n35 ^ "lelastasis * »mpai*| wHh the corrateBon between o 



A second problem is that various types of breast carcinoma are grouped 

S C o m many SUrViVal ShldieS - Because 1116 current Uterature suggests that 
?t, . ac j ivation Is infrequent in lobular carcinoma, studies that combine 
mfaltrabng ductal and lobular carcinomas may dilute the prognostic effect of o- 
er&B-2 activation in duqtal tumors. In addition, most studies do not analyze 
infammatory breast carcinoma separately. This condition frequently shows c 
erbB-2 activation and has a worse prognosis than the usual mammary card- 
noma, but it is an uncommon lesion. 

A third potential problem is the paucity ofstudies that attempt to correlate 
c-erfcB-2 activation with clinical outcome in subsets of breast carcinoma wilhout 
metastasis. TVo recent abstracts reported that in patients without lymph node 
metastasis who had various risk factors for recurrence (such as large tumor size 
and absence of estrogen receptors), c-erfcB-2 overexpresston predicted early 
recurrence."^ In patients with ductal carcinoma in situ, one small study found 
no association between tumor recurrence and c-erfcB-2 activation. 4 * 

A fourth problem is the lack of data regarding whether the prognosis 
correlates better with c-erfcB-2 DNA amplification or with mRNA or protein 
overproduction. Most studies that find a coirelation between c-eriB-2 activa- 
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tion and poor patient outcome measure c-eriB-2 DNA. amplification (Table 5) 
and breast carcinoma patients with greater amplification of cerfeB-2 may have 

KTT^ StUdi0S SUggCSt ** h« >"ore prognos! 

^ °r e rt ction ' WM,JS b * oHnta* significance of e4*B4 
overproduction without DNA amplificatiqn deserves further research Few 
studies have attempted to correlate patient outcome with c-erbB-2 mRNA 
STJJ "TV**? of c-eriB-2 protein overproduction userela- 

2 aSbotS ^ SUCh " taraunoh » toche ™-l with poly- 

Comparison of c-eroB-2 Activation With Other Oncogenes In 
Breast Carcinoma 

Other oncogenes that may have prognostic implications inhuman breast cancer 

KTT* ^rfT 711 " ^ $ecti0n ^ to * comparison 

between the clinical relevance of c-erbB-2 and these other oncogenes. 

.. J he °-?fc gene is often activated in breast carcinomas, but c-myc activa- 
tion generally has less prognostic Importance than c^rfcB-2 activaHon.«^.w.w 
Une study found a correlation between increased mRNAs of c-erbB-2 and o 

EEL? . Si" rept>rtS haVe , not confirm ^ 9U0S Subsequentresearch, 
however, could demonstrate a subset of breast carcinomas la which cmyo has 
more prognostic importance than c-erfcB-2. 

The gene c-erhB-1 for the epidermal growth factor receptor (EGFR) is 
homologous with c-eriB-2 but is infrequently amplified in breast carcinomas* 
Overproduction of EGFR, however, occurs more frequently than amplification 
and may correlate with a poor prognosis. In studies that have examined both * 
er©B-2 and EGFR m the same tumor, c-er&B-2 has a stronger correlation with 
poor ^re-gnostic factors.*" Studies have tended to show no correlation between 
amplification of c*rf>B-2 and c*rbB-l or overproduction of c-«rfcB-2 and EGFR, 

?Z!^» a 8 i e 2f^ IeCUlar l6Vel EGFR mediates Phosphorylation of c-eriB^ 
protein ».«,«.m.i» Recent reviews describe EGFR in breast carclnoma.Aiw 

The genes c-ef&A and ttor-I are homologous to the thyroid hormone recep- 
tor, and they are located adjacent W^bB* on chromosome 17. These genes 
are frequently coamplified with c-erbB-2 in breast carcinomas. The absence of 
c-eroA expression in breast carcinomas, however, is evidence against an impor- 
tant role for this gene m breast neoplasia .» Amplification of c-erbB-2 can occur 
without ear-1 amplification, and these tumors have a decreased survival that is 
similar to tumors with both c-erfeB-2 and ear-I amplification." Consequendy 
c-ereB-2 amplification seems to be more important than amplification of c~erbk 



Other genes also have been compared with cer&B-2 activation in breast 
carcinomas. Onestudyfound a significant correlation between increased c^rbB- 
Ld^l^i^n v S6 i m * NAs Platelet-derived growth factor chain A, 
and Ki-ras.** Allelic deletion of o-Ha-nw may indicate a poorer prognosis in 
breast carcinoma* but it has not been compared with e-er&B-2 activation. Some 
studies have suggested a correlation between advanced stage or recurrence of 
breast carcinoma and activation of any one of several oncogenes «u» 
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ACTIVATION OF c-erf>B-2 IN NON-MAMMARY TISSUES 

Incidence of c-erbB-2 Activation In Non-Mammary Tissues 

Table 7 summarizes the normal tissues In which c-erfrB-2 expression has been 
detected, usually with ixnmunohistochemical methods using polyclonal anti- 

TABLE 7. PRESENCE OR ABSENCE OF *e#*B-2 RtRNA OB o-erf>8*2 PROTEIN IN 
NORMAL HUMAN TISSUES 



Tissues With 

MrbB-2 Tissues Producing Tissues lacking Tissues Lacking 

mRNA c-e/bB-2 Protein* <H>rDB-2mRNA . G-erbB4 Protein 



Skin* 


Epidermis* 






External root sheath** 






Eocrtne sweat gland 5 * 






Fetal oral mucosa 62 


Postnatal oral mucosa 62 


Stomach 2 * 


Fetal esophagus 62 


Postnatal esophagus 62 


Stomach 2 *** 






Fetal Intestine 02 * 




Jejunum 84 


Small Intestine 22 ." 




Colons 


Colon 2 * 62 




Kidney 24 


Fetal kidney 62 * Kidneys"* 


Glomerulus® 




Fetal proximal tubule 62 


Postnatal Bowman's capsule 62 




Postnatal proximal tubule 62 




Distal tubule 62 




Fetal collecting duct 62 


Postnatal collecting duct 62 




Fetal renal pelvis 62 


Postnatal renal pelvis 62 




Fetal ureter 62 


Postnatal fetal ureter 62 


Uver 24 


Hepatocytes 22 


Uver* 2 ^ 




Pancreatic acini 22 






Pancreatic ducts 22 * 62 






Endocrine cede of Islets 


Pancreatic Islets 62 




of Langemans 22 




Lung 24 


Fetal trachea 62 . 


Postnatal trachea 62 


Fetal bronchioles 62 ^ 


Postnatal bronchioles 62 




Bronchioles 6 * 






Postnatal alveoli 62 - 6 * 


Fetal brain 24 


Fetal ganglion cells 62 


Postnatal brain 62 




Postnatal ganglion cells 62 


Thyroid' 





Uterus 24 

Ovary" 

Bloodvessels 42 Endothelium 62 

Placenta 24 

Adrenocortical cells 62 
Postnatal thymus 62 
Fibroblasts 62 
Smooth muscle ceils 62 

Cardiac muscle cells? 2 



•This protein study used Western blots; tha rest used {mmunohlstochamlcal methods. 
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bodies. Only a few studies have been performed, and some of these do not 
demonstrate convincing cell membrane reactivity in the published photo- 
graphs. The interpretations in these studies, however, *re listed, with the 
caveat that these findings should be confirmed by immunopfecipitation or 
Western or RNA blots. Production of <H>r*B-2 has been identified in normal 
epithelium of the gastrointestinal tract and skin. Discrepancies regarding <s 
eriB-2 protein in other tissues could be due, at least in part, to differences in 
techniques. 

Ihe data on c-<?r&B-2 activation in various non-mammary neoplasms 
should be interpreted with caution, because only small numbers of tumors have 
been studied, usually by immunohistochemical methods using polyclonal anti- 
bodies. Studies using cell lines have been excluded, because cell culture can 
induce amplification and overexpression of other genes, although this has not 
been documented for c-er£B-2. 

Activation of c-eriB-2 has been identified in 32 percent (64 of 203) of 
ovarian carcinomas in eight studies (Table 8). One abstract* 5 stated that ovarian 
carcinomas contained significantly more c-eriB-2 protein than ovarian non- 
epithelial malignancies. Another report* showed that 12 percent of ovarian 
carcinomas had c-er6B-2 overproduction without amplification. 

Activation of c-erf;B-2 has been identified in 20 percent (40 of 108) of 
gastric adenocarcinomas in seven studies, including 33 percent (21 of 64) of 



TABLE 8. c-erDM ACTIVATION IN HUMAN OYNECOLOGIC TUMORS* 







<H»r6d-2 


c*e/ftB-2 






roRNA 


Protein 


Tumor Type 


c-eroB-2 DNA 


Over- 


Over- 


Amplification 


production 


production 


Ovary—Carcinoma, not otherwise 


31/120,* i/ii«r 


20/67* 


23/73* 


specified 


0/6,^0/5,1* OA"* 


38/72*1 




0/2 » Q/1tio 






Ovary— serous (papillary) carcinoma 








iOvary— endometrioid carcinoma - - 








Ovary— mucinous carcinoma 


1/2,™ 071* 






Ovary— clear cell carcinoma 








Ovary— mixed epithelial carcinoma 


0/2* 






Ovary— endometrioid borderline tumor 


0/172 






Ovary— mucinous borderline tumor 


0/3 7e 






Ovary— serous cystadenoma 


0/4* 






Ovary— mucinous cystadenoma 


0/2* 






Ovary— sclerosing stromal tumor 


on* 






Ovary— flbrothecoma 


0/1* 






Uterus— endometrial adenocarcinoma 


0/4." 071"° 







-Shown as number o! cases with amplification (or overproductlon)/tolal number of cases studied: reference Is 
given as superscript All protein studies used Immunohistochemical methods. 
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int^nalcwr tubular subtypes and 9 percent (4 of 47) of difluse or signet ringcefl 

25L1 ^omas, although an additional immunohistochemJcal 

^udy detected c^r&B ; 2 protein in seven of eight tissues feed in Bouins solu- 
tion One study found greater immunohistochemical reactivity for c-eriB-2 
protein m colonic adenomatous polyps than in the adjacent normal epithelium 
using Bourn s fixative Lesions with anaplastfc features and progression to inva- 
s^e cardnoma tended to show decreased imoninohistochemical reactivity fore- 
erbB-2 protein^ Hepatocellular carcinomas (12 of 14 cases) and cholangioearci- 
nomas (46 of 63 cases) reacted with antibodies against c-erMW in one stady, but 
some of these positive cases showed only diffuse cytoplasmic staining, which 



TABLE 9. c-erfiB-2 ACTIVATION IN HUMAN GASTROINTESTINAL TUMORS* 



Tumor Type 



c-e/fcB*2DNA 
Amplification 



c-erbB-2 
Protein 
Over* 



Esophagus— squamous eel! carcinoma 
Stomach— carcinoma, poorly differentiated 
Stomach— adenocarcinoma 

Stomach — carcinoma, Intestinal or tubular type 
Stomach— carcinoma, diffuse or signet ring cell type 
Colorectum— carcinoma 



0/1 w 
0J22tM 

2/24* 2/9** 2/8,™ 
2/8 » OW* 

5/1 0*» 

0/21W 

2/49" 1/45,"' 
1/45,^1/45* 
0/40," 0/32,«»Oi3 B 
0/1» 
0/5*> 
077" 
071" 

0/1 » 



0/1« 

4£7,»3/10« 

16/54^ 
4/45® 

1/22,» 7/B 220 



Colon— villous adenoma 
Colon— tubutovillous adenoma 
Colon— tubular fcdenoma 
Colon— hyperplastic polyp 
Intestine— leiomyosarcoma 
Hepatoc^lularcafc^oma — - 
Hepatoblastoma 
Cholanglocarcinoma 

Pancreas— adenocarcinoma _ 
Pancreas— acinar carcinoma _ 
Pancreas— clear cell carcinoma _ 
Pancreas— large cell carcinoma _ 
Pancreas— signet ring carcinoma _ 
Pancreas— chronic Inflammation _ 

-Shown as number of cases with amplification (ox overproduction)/^ number ol cases started: reference is 
1 rnR^^ " Pr0t6ln methods, No arS^S^ 

'Tissues fixed In Bouln'e solution. 

'Only cases with distinct membrane staining are Interpreted as showing c^eroB-2 overproduction. 



19/10*6 
0/1" 

12/14?* 072« 

46/63» 
2/BO/" 072" 
0/1 « 
0/2*' 
0/3<* 

0/14** 



I*/Ufc*/20U3 X/:i7 FAX 310 208 5971 

o 



INTO 6 



o 



81-019 



182 T.fc SINGLETON AND J.a $TRICKLER 



TABLE 10. c-erpB-2 ACTIVATION IN HUMAN PULMONARY TUMORS* 



Tumor Typo 



Non-email cell carcinoma 
Epidermoid carcinoma 
Adenocarcinoma 
Large cell carcinoma 
Small ceO carcinoma 
Carcinoid tumor 



o-ertB-ZDNA 
Amplification 



2/60,» 0/60* 

O/13eO/10 f CT 0/6» 

0721* 1/13 » 0/7 «i 0/7 f 0/3«w 

0/1 « 



c-erbB-2 
P/oteln 
Overproduction 



1/64« 

3/6* 

4/12» 

0/26»0/3» 
073 s9 



■Shown as number of cases with ampflfioation (or overproducttonyiota! number oi cases studied' referent* b 
Jjn £ eupe*ttfpt Afl protein studies usod rmmurK^ha^cTmBt^ f a^tTS 



does not indicate c-er&B-2 activation in breast neoplasms « Also, some pancre- 
atic carcinomas and chronic pancreatitis tissue had cytoplasmic Immuriohisto- 
chemical reactivity for oerfcB-2 protein, in addition to the rare case of pancre- 
atic adenocarcinoma with distinct cell membrane staining .« 

Tables 10 through 14 summarize the studies of c-erfeB-2 activation in other 
neoplasms. The c-eriB-2 oncogene is not activated in most of these tumor* 
Activation of c«rfrB4 has been detected in 1 percent (4 of 299) of pulmonary 
non-small cell carcinomas in nine studies, although one additional report* 
found o-eriB-2 protein overproduction in 41 percent (7 of 17). Renal cell carci- 
nomahad activation in 7 percent (2 of 30) in four studies. Overproduc- 

tion of c-erbB-2 protein was described in one transitional cell carcinoma of the 
urinary bladder, a grade 2 papillary lesion « Squamous cell carcinoma and basal 
cell carcinoma of the sldn may contain c-criB-2 protein, but it is not clear 

TABLE 11, c-erbB-2 ACTIVATION IN HUMAN HEMATOLOGIC PROLIFERATIONS- 



Tumor Type 



Hematologic malignancies 

Malignant lymphoma 

Acute leukemia 

Acute lymphoblastic leukemia 

Aoute myeloblasts leukemia 

Chronic leukemia 

Chronic lymphocytic leukemia 

Chronic myelogenous leukemia 

Myeloproliferative disorder 



— oerbB-2 DNA ~ 
Amplification 


c-ernB-2 
mRNA 
Over* 
production 


c*erf)B-2 
Protein 
Over- 
production 


0/23"' 






Oygw 0/3 107 


0/1' 


0/15« 


0/14* 






0/1 w 






0/3^ 






0/16S? 






0/6™ ■ 






078W 






0/1* 







U^S^fZZZ, TjZf***?,!" <»*<* 0 *x^*°*> """bet of eases studied; reference Is 
Biven as superscript. All protein studies used Immunohlstoehemlca) methods. 
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TABLE 12. oerbB-2 ACTIVATION IN HUMAN TUMORS OF SOFT TISSUE AND BONE* 



Tumor Typo 


c-erbB-2 DNA 
AmpllflcaUon 


Sarcoma 


0/10,"' 0/8" 


Malignant fibrous hlsttoc 


yioma QfV™ 


Uposarcoma 




Pleomorphic sarcoma 


0/1107 


Rhabdomyosarcoma 




Osteogente sarcoma 


0/2, «" 0/2*. 


Chondrosarcoma 




Ewlng*s sarcoma 


Q/1» 


Schwannoma 


Q/1 w 



^hown as number of cases with amplification (or overpioducltenytotal number of cases studied: reference fc 
given as superscript No studies analyzed for *erbB-2 mRNA or oerf,B-2 prolan. 



Aether the protein level is increased over that of normal skin » Thyroid 
carcinomas and adenomas can have low levels of increased e-eriB-2 mRNA 
One abstract, described low-level <H*r&B-2 DNA amplification in one of ten 
salivary gland pleomorphic adenomas.^ 

Correlation of c-orbB-2 Activation With Patient Outcome 

Very few studies have attempted to correlate activation in non- 

mammary tumors with outcome.* Slamon et al« showed that c-erbB-2 amplifica- 
tion or overexpression in ovarian carcinomas correlates with decreased survival, 
especially when marked activation is present. However, they did not report the 
stage, histological grade, or histological subtype of these neoplasms. Another 
study of stages III and IV ovarian carcinomas found a correlation between 
decreased survival and c-er£B-2 protein overproduction, but not between sur- 
vival and histological grade," One abstract stated that o-er&B-2 protein overpro- 
duction in 10 of 16 pulmonary adenocarcinomas correlated with decreased 
disease-free interval ™ AnoQierabstmctdescribed a tendency fox rmmunohisto- 



TABLE 13. e-erl>B»2 ACTIVATION IN HUMAN TUMORS OF THE URINARY TRACT- 



Tumor Type 



Kidney— renal cell carcinoma 
Wilms* tumor 

Prostate— adenocardnoma 
Urinary bladder— carcinoma 



twiroB* DNA 
Amplification 



1/5,6* 1/4,"»0/5W 



o-erbB-2 
mRNA 
Over- 
production 



0/16«* 



c-eroB-2 
Protein . 
Over- 
production 



0/23" 
1/48» 



'Shown as number of cases with ampllf {cation (or overproductton^otaJ number of cases sNed; reference Is 
given as superscript. All protein studies used ImmunohlstochemicaJ methods. 
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TAgLEU. c*ttf 6B>2 ACTIVATION IN MISCELLANEOUS HUMAN TUMORS 8 



Tumor Type 



tXVOB-2 
DNA 
Amplification 



c*erbB-2mRNA 
Overproduction 



c-eri>B*2 
Protein 
Over- 
production 



Skin— maJJonant melanoma 
SkJn r head and neck — squamous 

cell carcinoma 
Site not stated — squamous cell 

carcinoma 
Salivary gland— adenocarcinoma 
Parotid gland— adenoid cystic 

carcinoma 
. Thyroid— anaplastic carcinoma 
Thyroid— papillary carcinoma 
Thyroid— Adenocarcinoma 
Thyroid— adenoma 
Neuroblastoma 
Meningioma 



078,* 072* 



1/1* 



071' 
075* 
0/1" 
072' 

0735,* 078.^071 78 
072* 



0/10» 



0/1* 



0711 

3{l0W !evels)/5! 
Iflow teve!s)/2» 



^r^A^Jf , H a ^ 1 ^ numMrof eves rtiidtedimferencelB 

given as superscript AH protein studies usud Immunoh&tochemloal methods. 

JhT^ to grades of pros- 

tatfe adenocarcinoma.^ Adc^tional prognostic studies of ovarian carcinomas and 
other neoplasms are needed. 



SUMMARY 

rS^JV* oncogene can occur by ampliflcatlon of c-erAB-2 

DNA and by overproduction of c^r£B-2 xnRNA and *-eriB-2 protein. Approxi- 
mately 2Q percent of breast carcinomas show: eyidence of «-ery&2 activation 
which correlates with a poor prognosis primarily in patients with metastasis to 
wUJary lymph nodes. Studies that have attempted to correlate cerfeB-2 activa- 
tion with other prognostic factors in breast carcinoma have reported conflicting 
conclusions Hie pathologic and clinical significance of ceriB-2 activation in 
other neoplasms is unclear and should be assessed by additional studies 
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jgtjacts. If these minor cell proteins differ among cells to the same extent as the 
s Ctfsre abundant proteins, as is commonly assumed, only a small number of pro- 
- : tein diffidences (perhaps several hundred) suffice to create very large differences 
in* cell morphology and behavior. 

A Cell Can Change the Expression of Its Genes 
in Response to External Signals 3 

Most of the specialized cells in a multicellular organism are capable of altering 
their patterns of gene expression in response to extracellular cues. If a liver cell 
Is exposed to a glucocorticoid hormone, for example, the production of several 
specific proteins is dramatically increased. Glucocorticoids are released during 
periods of starvation or intense exercise and signal the liver to increase the 
production of glucose from amino acids and other small molecules; the set of 
proteins whose production is induced includes enzymes such as tyrosine amino- 
transferase, which helps to convert tyrosine to glucose. When the hormone is no 
longerpresent, the production of these proteins drops to its normal level. 

Other cell types respond to glucocorticoids in different ways. In fat cells, for 
example, the production of tyrosine aminotransferase is reduced, while some 
other eel! types do not respond to glucocorticoids at ail. These examples illustrate 
a general feature of cell specializations-different cell types often respond in dif- 
ferent ways to the same extracellular signal. Underlying this specialization are 
features that do not change, which give each cell type its permanently distinc- 
tive character. These features reflect the persistent expression of different sets of 
genes. 



Gene Expression Can Be Regulated at Many of the Steps 
in the Pathway from DNA to RNA to Protein 4 

If differences between the various cell types of an organism depend on the par- 
ticular genes that the cells express, at what level is the control of gene expression 
exercised? There are many steps in the pathway leading from DNA to protein, and 
all of them can in principle be regulated. Thus a cell can control the proteins it 
makes by (1) controlling when and how often a given gene is transcribed (tran- 
scriptional control), (2) controlling how the primary RNA transcript is spliced or 
otherwise processed (RNA processing control), (3) selecting which completed 
mRNAs in the cell nucleus are exported to the cytoplasm (RNA transport con- 
trol), (4) selecting which mRNAs in the cytoplasm are translated by ribosomes 
(franslational control), (5) selectively destabilizing certain mRNA molecules in 
foe cytoplasm (mRNA degradation control), or (6) selectively activating, inacti- 
vating, or compartmentalizing specific protein molecules after they have been 
made (protein activity control) (Figure 9-2). 

For most genes transcriptional controls are paramount. This makes sense 
because, of all the possible control points illustrated in Figure 9-2, only transcrip- 
tional control ensures that no superfluous intermediates are synthesized. In the 
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Figure 9-2 Six steps at which 
eucaryote gene expression can be 
controlled. Only controls that operate 
at steps 1 through 5 are discussed in 
this chapter. The regulation of protein 
activity (step 6) is discussed in 
Chapter 5; this includes reversible 
activation or inactivation by protein 
phosphorylation as well as 
irreversible inactivation by proteolytic 
degradation. 
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following sections we discuss the DNA and protein components that regulate the 
initiation of gene transcription. We return at the end of the chapter to the other 
ways of regulating gene expression. 

Summary 

The genome of a cell contains in its DNA sequence the information to make many 
thousands of different protein and RNA molecules. A cell typically expresses only a 
fraction ofits genes, and the different types of cells in multicellular organisms arise 
because different sets of genes are expressed. Moreover, cells can change the pattern 
of genes they express in response to changes in their environment, such as signals from 
other cells. Although all of the steps involved in expressing a gene can in principle be 
regulated, for most genes the initiation of RNA transcription is the most important 
point of control 



DNA-binding Motifs in Gene 
Regulatory Proteins 5 

How does a cell determine which of its thousands of genes to transcribe? As dis- 
cussed in Chapter 8, the transcription of each gene is controlled by a regulatory 
region of DNA near the site where transcription begins. Some regulatory regions 
are simple and act as switches that are thrown by a single signal. Other regula- 
tory regions are complex and act as tiny microprocessors, responding to a vari- 
ety of signals that they interpret and integrate to switch the neighboring gene on 
or off. Whether complex or simple, these switching devices consist of two fun- 
damental types of components: (1) short stretches of DNA of defined sequence 
and (2) gene regulatory proteins that recognize and bind to them. 

We begin our discussion of gene regulatory proteins by describing how these 
proteins were discovered. 



Gene Regulatory Proteins Were Discovered Using 
Bacterial Genetics 6 

Genetic analyses in bacteria carried out in the 1950s provided the first evidence 
of the existence of gene regulatory proteins that turn specific sets of genes on 
or off. One of these regulators, the lambda repressor, is encoded by a bacterial 
virus, bacteriophage lambda. The repressor shuts off the viral genes that code for 
the protein components of new virus particles and thereby enables the viral ge- 
nome to remain a silent passenger in the bacterial chromosome, multiplying with 
the bacterium when conditions are favorable for bacterial growth (see Figure 
6-80). The lambda repressor was among the first gene regulatory proteins to be 
characterized, and it remains one of the best understood, as we discuss later. 
Other bacterial regulators respond to nutritional conditions by shutting off genes 
encoding specific sets of metabolic enzymes when they are not needed. The lac 
repressor, for example, the first of these bacterial proteins to be recognized, turns 
off the production of the proteins responsible for lactose metabolism when this 
sugar is absent from the medium. 

The first step toward understanding gene regulation was the isolation of 
mutant strains of bacteria and bacteriophage lambda that were unable to shut 
off specific sets of genes. It was proposed at the time, and later proved, that most 
of these mutants were deficient in protfeins acting as specific repressors for these 
sets of genes. Because these proteins, like most gene regulatory proteins, are 
present in small quantities, it was difficult and time-consuming to isolate them. 
They were eventually purified by fractionating cell extracts on a series of stan- 
dard chromatography columns (see pp . 166-169). Once isolated, the pro- 
teins were shown to bind to specific DNA sequences close to the genes that they 



Figure 9-3 Double-helical structure 
of DNA. The major and minor grooves 
on the outside of the double helix a* 6 
indicated. The atoms are colored as 
follows: carbon, dark blue; nitrogen* 
light blue; hydrogen, white; oxflp& ; 
red; phosphorus, yellow. 
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Figure 9-71 A mechanism to explain 
both the marked deficiency of CG 
sequences and the presence of CG 
islands in vertebrate genomes. A 

black linemaiks the location of an 
unmethylated CG dinucleotide in the 
DNA sequence, while a red line marks 
the location of a methylated CG 
dinucleotide. 
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Summary 

the many types of cells in animals and plants are created largely through mecha- 
nisms that cause different genes to be transcribed in different cells. Since many spe- 
cialized animal cells can maintain their unique character when grown in culture, the 
gene regulatory mechanisms involved in creating them must be stable once estab- 
lished and heritable when the cell divides, endowing the cell with a memory of its 
developmental history. Procaryotes and yeasts provide unusually accessible model 
systems in which to study gene regulatory mechanisms, some of which may be rel- 
evant to the creation of specialized cell types in higher eucaryotes. One such mecha- 
nism involves a competitive interaction between two (or more) gene regulatory pro- 
teins, each of which inhibits the synthesis of the other; this can create a flip-flop 
switch that switches a cell between two alternative patterns of gene expression. Di- 
rector indirect positive feedback loops, which enable gene regulatory proteins to 
perpetuate their own synthesis, provide a general mechanism for cell memory. 

In eucaryotes gene transcription is generally controlled by combinations of gene 
regulatory proteins. It is thought that each type of cell in a higher eucaryotic organism 
contains a specific combination of gene regulatory proteins that ensures the expres- 
sion of only those genes appropriate to that type of cell A given gene regulatory pro- 
tein may be expressed in a variety of circumstances and typically is involved in the 
regulation of many genes. 

In addition to diffusible gene regulatory proteins, inherited states of chromatin 
condensation are also utilized by eucaryotic cells to regulate gene expression. In ver- 
tebrates DNA methylation also plays a part, mainly as a device to reinforce decisions 
about gene expression that are made initially by other mechanisms. 



Posttranscriptional Controls 

Although controls on the initiation of gene transcription are the predominant 
form of regulation for most genes, other controls can act later in the pathway 
from RNA to protein to modulate the amount of gene product that is made. Al- 
though these posttranscriptional controls, which operate after RNA polymerase 
|jas bound to the gene's promoter andbegun RNA synthesis, are less common 
^transcriptional control, for many genes they are crucial. It seems that every 
step in gene expression that could.be controlled in principle is likely to be regu- 
a ted under some circumstances for some genes. 

We consider the varieties of posttranscriptional regulation in temporal or- 
% according to the sequence of events that might be experienced by an RNA 
Iecule after its transcription has begun (Figure 9-72). 
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Figure 9-72 Possible post- 
transcriptional controls on gene 
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controls are likely to be used for any 
one gene. 



p °sttranscriptional Controls 



453 



MOLECULAR BIOLOGY OF 



THE CELL 



Bruce Alberts 
Alexander Johnson 
Julian Lewis 
Martin Raff 
Keith Roberts 
Peter Walter 



Garland Science 



f o u r t h 



edition 




Taylor & Francis Croup 



Garland 

Vice President: Denise Schanck 

Managing Editor: Sarah Gibbs 

Senior Editorial Assistant: Kirsten Jenner 

Managing Production Editor: Emma Hunt 

Proofreader and Layout: Emma Hunt 

Production Assistant Angela Bennett 

Text Editors: Marjorie Singer Anderson and Betsy Dilernia 

Copy Editor: Bruce Goatly 

Word Processors: Fran Dependahl, Misty Landers and Carol Winter 

Designer: Blink Studio, London 

Illustrator: Nigel Orme 

Indexen Janine Ross and Sherry Granum 

Manufacturing: Nigel Eyre and Marion Morrow 



Cell Biology Interactive 

Artistic and Scientific Direction: Peter Walter 

Narrated by: Julie Theriot 

Production, Design, and Development: Mike Morales 



Bruce Alberts received his Ph.D. from Harvard University and is 
President of the National Academy of Sciences and Professor of 
Biochemistry and Biophysics at the University of California, San 
Francisco. Alexander Johnson received his Ph.D. from Harvard 
University and is a Professor of Microbiology and Immunology at 
the University of California, San Francisco. Julian Lewis received 
his D.Phil. from the University of Oxford and is a Principal 
Scientist at the Imperial Cancer Research Fund, London. 
Martin Raff received his M.D. from McGill University and is at the 
Medical Research Council Laboratory for Molecular Cell Biology 
and Cell Biology Unit and in the Biology Department at University 
College London. Keith Roberts received his Ph.D. from the 
University of Cambridge and is Associate Research Director at the 
John Innes Centre, Norwich. Peter Walter received his Ph.D. from 
The Rockefeller University in New York and is Professor and 
Chairman of the Department of Biochemistry and Biophysics at 
the University of California, San Francisco, and an Investigator of 
the Howard Hughes Medical Institute. 



© 2002 by Bruce Alberts, Alexander Johnson, Julian Lewis, 
Martin Raff, Keith Roberts, and Peter Walter. 
© 1983, 1989, 1994 by Bruce Alberts, Dennis Bray, Julian Lewis, 
Martin Raff, Keith Roberts, and James D. Watson. 



All rights reserved. No part of this book covered by the copyright 
hereon may be reproduced or used in any format in any form or 
by any means— graphic, electronic, or mechanical, including 
photocopying, recording, taping, or information storage and 
retrieval systems— without permission of the publisher. 



library of Congress Cataloging-in-Publicaton Data 

Molecular biology of the cell / Bruce Alberts ... [et ad.]. — 4th ed. 
p. cm 

Includes bibliographical references and index. 

ISBN 0-8153-3218-1 (hardbound) - ISBN 0-8153-4072-9 (pbk.) 

1. Cytology. 2. Molecular biology. I. Alberts, Bruce. 

[DNLM: 1. Cells. 2. Molecular Biology. 1 
QH581.2 .M64 2002 
571.6--dc21 

2001054471 CIP 



Published by Garland Science, a member of the Taylor & Francis Group, 
29 West 35th Street, New York, NY 10001-2299 

Printed in the United States of America 

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 



Front cover Human Genome: Reprinted by permission 
from Nature, International Human Genome Sequencing 
Consortium, 409:860-921, 2001 © Macmillan Magazines 
Ltd. Adapted from an image by Francis Collins, NHGRI; 
Jim Kent, UCSC; Ewan Birney, EBI; and Darryl Leja, 
NHGRI; showing a portion of Chromosome 1 from the 
initial sequencing of the human genome. 

Back cover In 1967, the British artist Peter Blake created 
a design classic. Nearly 35 years later Nigel Orme 
(illustrator), Richard Denyer (photographer), and the 
authors have together produced an affectionate tribute 
to Mr Blake's image. With its gallery of icons and 
influences, its assembly created almost as much . 
complexity, intrigue and mystery as the original. 
Drosophila, Arabidopsis, Dolly and the assembled 
company tempt you to dip inside where, as in the 
original, "a splendid time is guaranteed for all." 
(Gunter BlobeL courtesy of The Rockefeller University; Marie 
Curie, Keystone Press Agency Inc; Darwin bust, by permission 
of the President and Council of the Royal Society, Rosalind 
Franklin, courtesy of Cold Spring Harbor Laboratory Archives; 
Dorothy Hodgkin, © The Nobel Foundation, 1964; James Joyce, 
etching by Peter Blake; Robert Johnson, photo booth 
self-portrait early 1930s, © 1986 Delta Haze Corporation all 
rights reserved, used by permission; Albert L. Lehnlnger, 
(unidentified photographer) courtesy of The Alan Mason 
Chesney Medical Archives of The Johns Hopkins Medical 
Institutions; Linus Pauling, from Ava Helen and Linus Pauling 
Papers, Special Collections, Oregon State University; Nicholas 
Poussin, courtesy ofArtTbday.com; Barbara McCUntock, 
© David Micklos, 1983; Andrei Sakharov, courtesy of Elena 
Bonner; Frederick Sanger, ©The Nobel Foundation, 1958.) 



gene A 



gene B 



ON A 



| TRANSCRIPTION 
■Mi RNA 



| TRANSCRIPTION 
RNA 



Figure 6-3 Genes can be expressed 
with different efficiencies. Gene A Is 
transcribed and translated much more 
efficiently than gene B.This allows the 
amount of protein A in the cell to be 
much greater than that of protein B. 
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FROM DNATO RNA 

Transcription and translation are the means by which cells read out, or express, 
the genetic instructions in their genes. Because many identical RNA copies can 
be made from the same gene, and each RNA molecule can direct the synthesis 
of many identical protein molecules, cells can synthesize a large amount of 
protein rapidly when necessary. But each gene can also be transcribed and 
translated with a different efficiency, allowing the cell to make vast quantities of 
some proteins and tiny quantities of others (Figure 6-3). Moreover, as we see in 
the next chapter, a cell can change (or regulate) the expression of each of its 
genes according to the needs of the moment— most obviously by controlling 
the production of its RNA 



Portions of DNA Sequence Are Transcribed into RNA 

The first step a cell takes in reading out a needed part of its genetic instructions 
is to copy a particular portion of its DNA nucleotide sequence — a gene — into an 
RNA nucleotide sequence. The information in RNA, although copied into another 
chemical form, is still written in essentially the same language as it is in DNA— 
the language of a nucleotide sequence. Hence the name transcription. 

Like DNA, RNA is a linear polymer made of four different types of nucleotide 
subunits linked together by phosphodiester bonds (Figure 6-4). It differs from 
DNA chemically in two' respects: (1) the nucleotides in RNA are 
ribonucleotides— that is, they contain the sugar ribose (hence the name ribonu- 
cleic acid) rather than deojcyribose; (2) although, like DNA, RNA contains the 
bases adenine (A), guanine (G), and cytosine (C), it contains the base uracil (U) 
instead of the thymine (T) in DNA. Since U, like T, can base-pair by hydrogen- 
bonding with A (Figure 6-5), the complementary base-pairing properties 
described for DNA in Chapters 4 and 5 apply also to RNA (in RNA, G pairs with 
C, and A pairs with U). It is not uncommon, however, to find other types of base 
pairs in RNA: for example, G pairing with U occasionally. 

Despite these small chemical differences, DNA and RNA differ quite dra- 
matically in overall structure. Whereas DNA always occurs in cells as a double- 
stranded helix, RNA is single-stranded. RNA chains therefore fold up into a 
variety of shapes, just as a polypeptide chain folds up to form the final shape of 
a protein (Figure 6-6). As we see later in this chapter, the ability to fold into com- 
plex three-dimensional shapes allows some RNA molecules to have structural 
and catalytic functions. 



Transcription Produces RNA Complementary to 
One Strand of DNA 

All of the RNA in a cell is made by DNA transcription, a process that has cer- 
tain similarities to the process of DNA replication discussed in Chapter 5. 
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Figure 6-89 Protein aggregates that cause human disease. (A) Schematic illustration of the type of 
conformational change In a protein that produces material for a cross-beta filament. (B) Diagram illustrating 
the self-infectious nature of the protein aggregation that is central to prion diseases. PrP is highly unusual 
because the misfolded version of the protein, called PrP*, induces the normal PrP protein it contacts- to 
change its conformation, as shown. Most of the human diseases caused by protein aggregation are caused by 
the overproduction of a variant protein that is especially prone to aggregation, but because this structure is 
not infectious in this way, it cannot spread from one animal to another. (C) Drawing of a cross-beta filament 
a common type of protease-resistant protein aggregate found in a variety of human neurological diseases. 
Because the hydrogen-bond interactions in a p sheet form between polypeptide backbone atoms (see Figure 
3-9), a number of different abnormally folded proteins can produce this structure. (D) One of several 
possible models for the conversion of PrP to PrP*, showing the likely change of two a-helices into four 
p-strands. Although the structure of the normal protein has been determined accurately, the structure of the 
infectious form is not yet known with certainty because the aggregation has prevented the use of standard 
structural techniques. (C, courtesy of Louise Serpell, adapted from M. Sunde et al.,/ Mo/. BioL 273:729-739, 
1 997; D, adapted from S.B. Prusiner, Trends Bfocftem. Sri 2 1 :482-487, 1 996.) 

animals and humans. It can be dangerous to eat the tissues of animals that con- 
tain PrP*, as witnessed most recently by the spread of BSE (commonly referred 
to as the "mad cow disease") from cattle to humans in Great Britain. 

Fortunately, in the absence of PrP*, PrP is extraordinarily difficult to convert 
to its abnormal form. Although very few proteins have the potential to misfold 
into an infectious conformation, a similar transformation has been discovered 
to be the cause of an otherwise mysterious "protein-only inheritance" observed 
in yeast cells. 

There Are Many Steps From DNA to Protein 

We have seen so far in this chapter that many different types of chemical reac- 
tions are required to produce a properly folded protein from the information 
contained in a gene (Figure 6-90). The final level of a properly folded protein in 
a cell therefore depends upon the efficiency with which each of the many steps 
is performed. 

We discuss in Chapter 7 that cells have the ability to change the levels of 
their proteins according to their needs. In principle, any or all of the steps in Fig- 
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Figure 6-90 The production of a 
protein by a eucaryotic cell. The final 
level of each protein in a eucaryotic cell 
depends upon the efficiency of each step 
depicted. 



ure 6-90) could be regulated by the cell for each individual protein. However, as 
we shall see in Chapter 7, the initiation of transcription is the most common 
point for a cell to regulate the expression of each of its genes. This makes sense, 
inasmuch as the most efficient way to keep a gene from being expressed is to 
block the very first step— the transcription of its DNA sequence into an RNA 
molecule. 



Summary 

The translation of the nucleotide sequence of an mRNA molecule into protein takes 
place in the cytoplasm on a large ribonucleoprotein assembly called a ribosome. The 
amino acids used for protein synthesis are first attached to a family of tRNA 
molecules, each of which recognizes, by complementary base-pair interactions, par- 
ticular sets of three nucleotides in the mRNA (codons). The sequence of nucleotides in 
the mRNA is then read from one end to the other in sets of three according to the 
genetic code. 

To initiate translation, a small ribosomal subunit binds to the mRNA molecule 
at a start codon (AUG) that is recognized by a unique initiator tRNA molecule. A 
large ribosomal subunit binds to complete the ribosome and begin the elongation 
phase of protein synthesis. During this phase, aminoacyl tRNAs — each bearing a 
specific amino acid bind sequentially to the appropriate codon in mRNA by forming 
complementary base pairs with the tRNA anticodon. Each amino acid is added to the 
C-terminal end of the growing polypeptide by means of a cycle of three sequential 
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Figure 7-5 Six steps at which 
eucaryotic gene expression can be 
controlled. Controls that operate at 
steps I through 5 are discussed ip this 
chapter. Step 6, the regulation of protein 
activity; includes reversible activation or 
Inactivation by protein phosphorylation 
(discussed in Chapter 3) as well as 
irreversible inactivation by proteolytic 
degradation (discussed in Chapter 6). 



Gene Expression Can Be Regulated at Many of the Steps 
in the Pathway from DNA to RNA to Protein 

If differences among the various cell types of an organism depend on the partic- 
ular genes that the cells express, at what level is the control of gene expression 
exercised? As we saw in the last chapter, there are many steps in the pathway 
leading from DNA to protein, and all of them can in principle be regulated. Thus 
a cell can control the proteins it makes by (1) controlling when and how often a 
given gene is transcribed (transcriptional control), (2) controlling how the RNA 
transcript is spliced or otherwise processed (RNA processing control), (3) 
selecting which completed mRNAs in the cell nucleus are exported to the cytosol 
and detennining where in the cytosol they are localized (RNA transport and 
localization control), (4) selecting which mRNAs in the cytoplasm are translated 
by ribosomes (translational control), (5) selectively destabilizing certain mRNA 
molecules in the cytoplasm (mRNA degradation control), or (6) selectively acti- 
vating, inactivating, degrading, or compartmentalizing specific protein 
molecules after they have been made (protein activity control) (Figure 7-5). 

For most genes transcriptional controls are paramount. This makes sense 
because, of all the possible control points illustrated in Figure 7-5, only tran- 
scriptional control ensures that the cell will not synthesize superfluous interme- 
diates. In the following sections we discuss the DNA and protein components 
that perform this function by regulating the initiation of gene transcription. We 
shall return at the end of the chapter to the additional ways of regulating gene 
expression. 

Summary 

The genome of a cell contains in Us DNA sequence the information to make many 
thousands of different protein and RNA molecules. A cell typically expresses only a 
fraction of its genes, and the different types of cells in multicellular organisms arise 
because different sets of genes are expressed. Moreover, cells can change the pattern 
of genes they express in response to changes in their environment, such as signals 
from other cells. Although all of the steps involved in expressing a gene can in prin- 
ciple be regulated, far most genes the initiation of RNA transcription is the most 
important point of control 



DNA-BINDING MOTIFS IN GENE REGULATORY 
PROTEINS 

How does a cell determine which of its thousands of genes to transcribe? As 
mentioned briefly in Chapters 4 and 6, the transcription of each gene is con- 
trolled by a regulatory region of DNA relatively near the site where transcription 
begins. Some regulatory regions are simple and act as switches that are thrown 
a single signal. Many others are complex and act as tiny microprocessors, 
Responding to a variety of signals that they interpret and integrate to switch the 
~"ughboring gene on or off. Whether complex or simple, these switching devices 
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occur in the germ line, the cell lineage that gives rise to sperm or eggs. Most of 
the DNA in vertebrate germ cells is inactive and highly methylated. Over long 
periods of evolutionary time, the methylated CG sequences in these inactive 
regions have presumably been lost through spontaneous deamination events 
that were not properly repaired. However promoters of genes that remain active 
in the germ cell lineages (including most housekeeping genes) are kept 
unmethylated, and therefore spontaneous deaminations of Cs that occur with- 
in them can be accurately repaired. Such regions are preserved in modern day 
vertebrate ceils as CG islands. In addition, any mutation of a CG sequence in the 
genome that destroyed the function or regulation of a gene in the adult would be 
selected against, and some CG islands are simply the result of a higher than nor- 
mal density of critical CG sequences. 

The mammalian genome contains an estimated 20,000 CG islands. Most of 
the islands mark the 5' ends of transcription units and thus, presumably, of 
genes. The presence of CG islands often provides a convenient way of identify- 
ing genes in the DNA sequences of vertebrate genomes. 

Summary 

The many types of cells in animals and plants are created largely through mecha- 
nisms that cause different genes to be transcribed in different cells. Since many 
specialized animal cells can maintain their unique character through many cell 
division cycles and even when grown in culture, the gene regulatory mechanisms 
involved in creating them must be stable once established and heritable when the 
cell divides. These features endow the cell with a memory of its developmental history. 
Bacteria and yeasts provide unusually accessible model systems in which to study 
gene regulatory mechanisms. One such mechanism involves a competitive interac- 
tion between two gene regulatory proteins, each of which inhibits the synthesis of the 
other; this can create a flip-flop switch that switches a cell between two alternative 
patterns of gene expression. Director indirect positive feedback loops, which enable 
gene regulatory proteins to perpetuate their own synthesis, provide a general mech- 
anism for cell memory. Negative feedback loops with programmed delays form the 
basis for cellular clocks. 

In eucaryotes the transcription of a gene is generally controlled by combinations 
of gene regulatory proteins. It is thought that each type of cell in a higher eucaryotic 
organism contains a specific combination of gene regulatory proteins that ensures 
the expression of only those genes appropriate to that type of cell A given gene regu- 
latory protein may be active in a variety of circumstances and typically is involved 
in the regulation of many genes. 

In addition to diffusible gene regulatory proteins, inherited states of chromatin 
condensation are also used by eucaryotic cells to regulate gene expression. An espe- 
cially dramatic case is the inactivation of an entire X chromosome in female mam- 
mals. In vertebrates DNA methylation also functions in gene regulation, being used 
mainly as a device to reinforce decisions about gene expression that are made ini- 
tially by other mechanisms. DNA methylation also underlies the phenomenon of 
genomic imprinting in mammals, in which the expression of a gene depends on 
whether it was inherited from the mother or the father. 



POSTTRANSCRIPTIONAL CONTROLS 

In principle, every step required for the process of gene expression could be 
controlled. Indeed, one can find examples of each type of regulation, although 
any one gene is likely to use only a few of them. Controls on the initiation of 
gene transcription are the predominant form of regulation for most genes. But 
other controls can act later in the pathway from DNA to protein to modulate 
the amount of gene product that is made. Although these posttranscriptional 
controls, which operate after RNA polymerase has bound to the gene's promoter 
and begun RNA synthesis, are less common than transcriptional control, for 
many genes they are crucial. 
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Figure 7-86 A mechanism to explain 
both the marked overall deficiency 
of CG sequences and their clustering 
into CG islands in vertebrate 
genomes. A black fine marks the location 
of a CG dinucleotide in the DNA 
sequence, while a red "lollipop" indicates 
the presence of a methyl group on the 
CG dinucleotide. CG sequences that lie in 
regulatory sequences of genes that are 
transcribed in germ cells are unmethylated 
and therefore tend to be retained In 
evolution. Methylated CG sequences, on 
the other hand, tend to be lost through 
deamination of 5-methyl C toT, unless the 
CG sequence is critical for survival. 
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CHAPTER 29 

Regulation of transcription 



Ber\\cxmtv\ L-euDivx 



fite pbenotypic differences that distinguish the 
various kinds of cells in a higher eukaryole are 
largely due to differences in the expression of 
f enes that code Tor proteins, that is. those tran- 
scribed by R\A polymerase II. In principle, the 
expression of these genes might be regulated at 
an v one of several stages. The concept of the 
-level or controP implies that gene expression 
Is not necessarily an automatic process once it 
has begun. It could be regulated in a gene- 
specific way ai any one or several sequential 
sieps. We can distinguish (at least) five poten- 
tial control points, running the series: 

Activation of gene structure 
•i 

Initiation of transcription 
i 

Processing ihe transcript 
I 

Transport, to cytoplasm 
X 

Translation or mRNA 

The existence of the first step is implied by 
the discovery thai genes may exist in either oV 
two structural conditions. Relative to the state 
«r most or the genome, genes <ire round in 
mi "active" stale in Ihe ceils in which they 
are expressed (see Chapter 27), The change oV 
structure, is distinct from the act of transcrip- 
tion, and indicates that the gene is nranserib- 
able." This suggests that acquisition of the 
"active*' structure must be the first step in gene 
expression. 

Transcription of a gene in the active state is 



controlled at the stage of initiation, that is. by 
the interaction or R.N A polymerase with its pro- 
moter. This is now becoming susceptible to 
analysis in the in vitrv systems (see Chapter 
*«). For most genes, this is a maior control 
point; probably ii is the most common level of 
regulation. 

There is at present no evidence for control 
at subsequent stages or transcription in eukary- 
otic cells, Tor example, via antitermination 
mechanisms. 

The primary transcript is modified by capping 
at the 5' end. and usually also by polyadenyla- 
tion at the 3' end. introns must be spliced out 
from the transcripts of interrupted genes. The 
mature UNA must be exported from, the nucleus 
to the cytoplasm. Regulation or gene expression 
by selection or sequences at the levef or nuclear 
UNA might involve any or all of these stages, 
but the one for which we have most evidence 
concerns changes In splicing: some genes are 
expressed by means or alternative splicing pat- 
terns whose regulation controls the type or pro- 
tein product (see Cbapter 30). 

Finally. Ihe translation or an mRNA in the cyto- 
plasm can be specifically controlled. There is little 
evidence for the employment or this mechanism in 
adult somatic cells, but it does occur in some 
embryonic situations, as described in Chapter 7. 
The mechanism is presumed to involve the block- 
big or initiation or translation of some mRNAs by 
specific protein factors. 

But having acknowledged that control or gene 
expression can occur at multiple stages, and 
that production of RNA cannot inevitably be 
equaled with production or protein, it is clear 
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that the overwhelming majority or regulatory 
events occur at^the- initiation of transcription. 
Regulation of tissue-specific gene transcription 
lies at the heart of eukaryotic differentiation; 
indeed, we see examples in Chapter 38 in 
which proLeins that regulate embryonic devel- 
opment prove to be transcription factors. A reg- 
ulatory transcription factor serves to provide 



common control of a large number of target 
genes, and we seek to answer two questions 
about this mode of regulation: what identifies 
the common target genes to the transcription 
factor; and how is the activity of the transcrip- 
tion factor itself regulated in response to intrin- 
sic or extrinsic signals? 



Response elements identify genes under common 
regulation 



The principle that emerges from characterizing 
groups of genes under common control is that 
they- share a promoter element that is recognized 
by a regulatory- transcription factor. An element 
that causes a* gene to respond to such a factor 
is called a response element; examples are the 
HSE (heat shock response element), GRE 
(glucocorticoid response element), SHE (serum 
response element). 

The properties of some inducible transcription 
factors and the elements that they recognize are 
summarized in- Table 29.1. Response elements 
have the same general characteristics as 
upstream elements of promoters or enhancers. 
They contain short consensus sequences, and 
copies of the response elements found in dif- 
ferent genes are closely related, but not neces- 
sarily identical. The region bound by the factor 
extends for a short distance on either side of 



Table 29.1 Inducible transcription factors bind to 
response elements that identify groups of promoters 
or enhancers subject to coordinate control. 



Regulatory Agent Module Consensus 



Factor 



Heat Shock HSE CNNGAANNTCCNNG HSTF 

Glucocorticoid GRE TGGTACAAATGTTCT Receptor 

Phorbo! ester TRE TGACTCA API 

Serum SRE CCATATTAGG SRF 



the consensus sequence. In promoters, the ele- 
ments are not present at fixed distances from 
the siartpoint, but are usually <200 bp upstream 
of it. The presence of a single element usually 
is sufficient to confer the regulatory response, 
but sometimes there are multiple copies. 

Response elements may be located in P 10 * 
moters or in enhancers. Some types of element 
are typically found in one rather than the other 
usually an HSE is found in a promoter, while a 
GRE is round in an enhancer. We assume thai 
all response elements function by the sa° ,f 
general principle. A gene is regulated by a 
sequence at tlxe promoter or enhancer thai * 
recognized by a specific protein. The P****!*. 
Junctions as a transcription factor needed J* 
RNA polymerase to initiate. Active protein 
available only under conditions when the^ 
to be expressed; its absence means that the 
moter is not activated by this particular € ^ tu . s 

An example or a situation in which 
genes are controlled by a single factor is t 
vided by the heat shock response. This is ^ 
mon to a wide range of prokaryo leS { f 
eukaryotes and involves multiple contro w 
gene expression: an increase in tefnp crP (>|l 
turns off transcription of some genes, t^^d 
transcription of the heat shock § cnCS ' R ^A> 
causes changes in the translation of rr \ tf s 
The control or the heat shock genes ilj u * a „d 
the differences between prokaryotic ^ 
eukaryotic modes of control. In. bacteria, 0 ^ 
sigma factor is synthesized that dire^^r 
polymerase holoenzyme to recognize * n - 
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Abstract 

Background: Prostate stem cell antigen (PSCA) is a recently defined homologue of the Thy-l/Ly-6 family of 
glycosylphosphatidylinositol (GPI)-anchored cell surface antigens. The purpose of the present study was to 
examine the expression status of PSCA protein and mRNA in clinical specimens of human prostate cancer (Pea) 
and to validate it as a potential molecular target for diagnosis and treatment of Pea. 

Materials and Methods: Immunohistochemical (IHC) and in situ hybridization (ISH) analyses of PSCA 
expression were simultaneously performed on paraffin-embedded sections from 20 benign prostatic hyperplasia 
(BPH), 20 prostatic intraepithelial neoplasm (PIN) and 48 prostate cancer (Pea) tissues, including 9 androgen- 
independent prostate cancers. The level of PSCA expression was semiquantitative^ scored by assessing both the 
percentage and intensity of PSCA-positive staining cells in the specimens. Then compared PSCA expression 
between BPH, PIN and Pea tissues and analysed the correlations of PSCA expression level with pathological grade, 
clinical stage and progression to androgen-independence in Pea. 

Results: In BPH and low grade PIN, PSCA protein and mRNA staining were weak or negative and less intense 
and uniform than that seen in HGPIN and Pea. There were moderate to strong PSCA protein and mRNA 
expression In 8 of 1 1 (72.7%) HGPIN and in 40 of 48 (83.4%) Pea specimens examined by IHC and ISH analyses, 
with statistical significance compared with BPH (20%) and low grade PIN (22.2%) samples (p < 0.05, respectively). 
The expression level of PSCA increased with high Gleason grade, advanced stage and progression to androgen- 
independence (p < 0.05. respectively). In addition. IHC and ISH staining showed a high degree of correlation 
between PSCA protein and mRNA overexpression. 

Conclusions: Our data demonstrate that PSCA as a new cell surface marker is overexpressed by a majority of 
human Pea. PSCA expression correlates positively with adverse tumor characteristics, such as increasing 
pathological grade (poor cell differentiation), worsening clinical stage and androgen-independence, and 
speculatively with prostate carcinogenesis. PSCA protein overexpression results from upregulated transcription 
of PSCA mRNA. PSCA may have prognostic utility and may be a promising molecular target for diagnosis and 
treatment of Pea. 
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Introduction 

Prostate cancer (Pea) is the second leading cause of can- 
cer-related death in American men and is becoming a 
common cancer increasing in China. Despite recently 
great progress in the diagnosis and management of local- 
ized disease, there continues to be a need for new diagnos- 
tic markers that can accurately discriminate between 
indolent and aggressive variants of Pea. There also contin- 
ues to be a need for the identification and characterization 
of potential new therapeutic targets on Pea cells. Current 
diagnostic and therapeutic modalities for recurrent and 
metastatic Pea have been limited by a lack of specific tar- 
get antigens of Pea. 

Although a number of prostate-specific genes have been 
identified (i.e. prostate specific antigen, prostatic acid 
phosphatase, glandular kallikrein 2), the majority of these 
are secreted proteins not ideally suited for many immuno- 
logical strategies. So, the identification of new ceil surface 
antigens is critical to the development of new diagnostic 
and therapeutic approaches to the management of Pea. 

Reiter RE et al ( 1 ] reported the identification of prostate 
stem cell antigen (PSCA), a cell surface antigen that is pre- 
dominantly prostate specific. The PSCA gene encodes a 
123 amino acid glycoprotein, with 30% homology to 
stem cell antigen 2 (Sea 2). Like Sca-2, PSCA also belongs 
to a member of the Thy-l/Ly-6 family and is anchored by 
a glycosylphosphatidyl inositol (GPI) linkage. mRNA in 
situ hybridization (ISH) localized PSCA expression in nor- 
mal prostate to the basal cell epithelium, the putative 
stem cell compartment of prostatic epithelium, suggesting 
that PSCA may be a marker of prostate stem/progenitor 
cells. 

In order to examine the status of PSCA protein and mRNA 
expression in human Pea and validate it as a potential 
diagnostic and therapeutic target for Pea, we used immu- 
nohistochemistry (IHC) and in situ hybridization (ISH) 
simultaneously, and conducted PSCA protein and mRNA 
expression analyses in paraffin-embedded tissue speci- 
mens of benign prostatic hyperplasia (BPH, n ■ 20), pros- 
tate intraepithelial neoplasm (PIN, n = 20) and prostate 
cancer (Pea, n = 48). Furthermore, we evaluated the possi- 
ble correlation of PSCA expression level with Pea tumori- 
genesis, grade, stage and progression to androgen- 
independence. 

Materials and methods 
Tissue samples 

All of the clinical tissue specimens studied herein were 
obtained from 80 patients of 57-84 years old by prostate- 
ctomy, transurethral resection of prostate (TURP) or biop- 
sies. The patients were classified as 20 cases of BPH, 20 
cases of PIN, 40 cases of primary Pea, including 9 patients 



with recurrent Pea and a history of androgen ablation 
therapy (orchiectomy and/or hormonal therapy), who 
were referred to as androgen-independent prostate can- 
cers. Eight specimens were harvested from these andro- 
gen-independent Pea patients prior to androgen ablation 
treatment. Each tissue sample was cut into two parts, one 
was fixed in 10% formalin for IHC and the other treated 
with 4% paraformaldehyde/0.1 M PBS PH 7.4 in 0.1% 
DEPC for 1 h for ISH analysis, and then embedded in par- 
affin. All paraffin blocks examined were then cut into 5 
um sections and mounted on the glass slides specific for 
IHC and ISH respectively in the usual fashion. H&E- 
stained section of each Pea was evaluated and assigned a 
Cleason score by the experienced uro logical pathologist at 
our institution based on the criteria of Gleason score (2). 
The Gleason sums are summarized in Table 1. Clinical 
staging was performed according to Jewett-whitmore- 
prout staging system, as shown in Table 2. In the category 
of PIN, we graded the specimens into two groups, i.e. low 
grade PIN (grade I - II) and high grade PIN (HGPIN, 
grade III) on the basis of literatures [3,4). 

Immunohistochemlcal (IHC) analysis 
Briefly, tissue sections were deparaffinized, dehydrated, 
and subjected to micro waving in 10 mmol/L citrate 
buffer, PH 6.0 (Boshide, Wuhan, China) in a 900 W oven 
for 5 min to induce epitope retrieval. Slides were allowed 
to cool at room temperature for 30 min. A primary mouse 
antibody specific to human PSCA (Boshide, Wuhan, 
China) with a 1 :100 dilution was applied to incubate with 
the slides at room temperature for 2 h. Labeling was 
detected by sequentially adding biotinylated secondary 
antibodies and strepavidin-peroxidase, and localized 
using 3,3'-diaminobenzidine reaction. Sections were then 
counterstained with hematoxylin. Substitution of the pri- 
mary antibody with phosphate-buffered-saline (PBS) 
served as a negative-staining control. 

mRNA in situ hybridization (ISH) 

Five-um-thick tissue sections were deparaffinized and 
dehydrated, then digested in pepsin solution (4 mg/ml in 
3% citric acid) for 20 min at 37.5 °C, and further proc- 
essed for ISH. Digoxigenin-labeled sense and antisense 
human PSCA RNA probes (obtained from Boshide, 
Wuhan, China) were hybridized to the sections at 48 °C 
overnight. The posthybridization wash with a high strin- 
gency was performed sequentially at 37 ° C in 2 * standard 
saline citrate (SSC) for 10 min, in 0.5 x SSC for 15 min 
and in 0.2 * SSC for 30 min. The slides were then incu- 
bated to biotinylated mouse anti-digoxigenin antibody at 
37.5 °C fori h followed by washing in 1 * PBS for 20 min 
at room temperature, and then to strepavidin-peroxidase 
at 37.5 °C for 20 min followed by washing in 1 x PBS for 
15 min at room temperature. Subsequently, the slides 
were developed with diaminobenzidine and then coun- 
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Table I: Correlation of PSCA expression with Gleason score 



Gleason score 



0-6 (%) 



Intensity * frequency 



9(%) 



2-4 
5-7 
8-10 



S(83) 
19(79) 
S(28) 



1(17) 
5(21) 
13(72) 



Table 2: Correlation of PSCA expression with clinical stage 







Intensity * frequency 




Tumor stage 


0-o{%) 




9(%) 




27 (67.S) 
2(25) 




13(32.5) 
6(75) 





terstained with hematoxylin to localize the hybridization 
signals. Sections hybridized with the sense control probes 
routinely did not show any specific hybridization signal 
above background. All slides were hybridized with PBS to 
substitute for the probes as a negative control. 

Scoring methods 

To determine the correlation between the results of PSCA 
immunostaining and mRNA in situ hybridization, the 
same scoring manners are taken in the present study for 
PSCA protein staining by IHC and PSCA mRNA staining 
by ISH. Each slide was read and scored by two independ- 
ently experienced urological pathologists using Olympus 
BX-41 light microscopes. The evaluation was done in a 
blinded fashion. For each section, five areas of similar 
grade were analyzed semiquantitativeiy for the fraction of 
cells staining. Fifty percent of specimens were randomly 
chosen and rescored to determine the degree of interob- 
server and intraobserver concordance. There was greater 
than 95% intra- and interobserver agreement. 

The intensity of PSCA expression evaluated microscopi- 
cally was graded on a scale of 0 to 3+ with 3 being the 
highest expression observed (0, no staining; 1+, mildly 
intense; 2+, moderately intense; 3+, severely intense). The 
staining density was quantified as the percentage of cells 
staining positive for PSCA with the primary antibody or 
hybridization probe, as follows: 0 = no staining; 1 = posi- 
tive staining in <25% of the sample; 2 = positive staining 
in 25%-50% of the sample; 3 = positive staining in >50% 



of the sample. Intensity score (0 to 3+) was multiplied by 
the density score (0-3) to give an overall score of 0-9 
[1,5). In this way, we were able to differentiate specimens 
that may have had focal areas of increased staining from 
those mat had diffuse areas of increased staining (6 J. The 
overall score for each specimen was then categorically 
assigned to one of the following groups: 0 score, negative 
expression; 1-2 scores, weak expression; 3-6 scores, mod- 
erate expression; 9 score, strong expression. 

Statistical analysis 

Intensity and density of PSCA protein and mRNA expres- 
sion in BPH, PIN and Pea tissues were compared using the 
Chi-square and Student's t-test. Univariate associations 
between PSCA expression and Gleason score, clinical 
stage and progression to androgen- independence were 
calculated using Fisher's Exact Test. For ail analyses, p < 
0.05 was considered statistically significant. 

Results 

PSCA expression in BPH 

In general, PSCA protein and mRNA were expressed 
weakly in individual samples of BPH. Some areas of 
prostate expressed weak levels (composite score 1-2), 
whereas other areas were completely negative (composite 
score 0). Four cases (20%) of BPH had moderate expres- 
sion of PSCA protein and mRNA (composite score 4-6) 
by IHC and ISH. In 2/20 (10%) BPH specimens, PSCA 
mRNA expression was moderate (composite score 3-6), 
but PSCA protein expression was weak (composite score 
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2) in one and negative (composite score 0) in the other. 
PSCA expression was localized to the basal and secretory 
epithelial cells, and prostatic stroma was almost negative 
staining for PSCA protein and mRNA in all cases 
examined. 

PSCA expression In PIN 

In this study, we detected weak or negative expression of 
PSCA protein and mRNA (<£2 scores) in 7 of 9 (77.8%) 
low grade PIN and in 2 of 1 1 (18.2%) HGPIN, and mod- 
erate expression (3-6 scores) in the rest 2 low grade PIN 
and 5 of 11 (45.5%) HGPIN. One HGPIN with moderate 
PSCA mRNA expression (6 score) was found weak stain* 
ing for PSCA protein (2 score) by IHC. Strong PSCA pro- 
tein and mRNA expression (9 score) were detected in the 
remaining 3 of 11 (27.3%) HGPIN. There was a statisti- 
cally significant difference of PSCA protein and mRNA 
expression levels observed between HGPIN and BPH (p < 
0.05), but no statistical difference reached between low 
grade PIN and BPH (p > 0.05). 

PSCA expression in Pea 

In order to determine if PSCA protein and mRNA can be 
detected in prostate cancers and if PSCA expression levels 
are increased in malignant compared with benign glands, 
Forty-eight paraffin-embedded Pea specimens were ana- 
lysed by IHC and ISH. It was shown that 19 of 48 (39.6%) 
Pea samples stained very strongly for PSCA protein and 
mRNA with a score of 9 and another 21 (43.8%) speci- 
mens displayed moderate staining with scores of 4-6 (Fig- 
ure 1). In addition, 4 specimens with moderate to strong 
PSCA mRNA expression (scores of 4-9) had weak protein 
staining (a score of 2) by IHC analyses. Overall, Pea 
expressed a significandy higher level of PSCA protein and 
mRNA than any other specimen category in this study (p 
< 0.05, compared with BPH and PIN respectively). The 
result demonstrates that PSCA protein and mRNA are 
overexpressed by a majority of human Pea. 

Correlation of PSCA expression with Gieason score in Pea 

Using the semi-quantitative scoring method as described 
in Materials and Methods, we compared the expression 
level of PSCA protein and mRNA with Gieason grade of 
Pea, as shown in Table 1. Prostate adenocarcinomas were 
graded by Gieason score as 2-4 scores » well-differentia- 
tion, 5-7 scores = moderate-differentiation and 8-10 
scores =» poor-differentiation [7]. Seventy-two percent of 
Gieason scores 8-10 prostate cancers had very strong 
staining of PSCA compared to 21% with Gieason scores 
5-7 and 17% with 2-4 respectively, demonstrating mat 
poorly differentiated Pea had significantly stronger 
expression of PSCA protein and mRNA than moderately 
and well differentiated tumors (p < 0.05). As depicted in 
Figure 1, IHC and ISH analyses showed that PSCA protein 
and mRNA expression in several cases of poorly differen- 



tiated Pea were particularly prominent, with more intense 
and uniform staining. The results indicate that PSCA 
expression increases significantly with higher tumor grade 
in human Pea. 

Correlation of PSCA expression with clinical stage in Pea 
With regards to PSCA expression in every stage of Pea, we 
showed the results in Table 2. Seventy-five percent of 
locally advanced and node positive cancers (i.e. C-D 
stages) expressed statistically high levels of PSCA versus 
32.5% that were organ confined (i.e. A-B stages) (p < 
0.05). The data demonstrate that PSCA expression 
increases significantly with advanced tumor stage in 
human Pea. 

Correlation of PSCA expression with androgen- 
independent progression of Pea 

All 9 specimens of androgen-independent prostate can- 
cers stained positive for PSCA protein and mRNA. Eight 
specimens were obtained from patients managed prior to 
androgen ablation therapy. Seven of eight (87.5%) of 
these androgen-independent prostate cancers were in the 
strongest staining category (score = 9), compared with 
three out of eight (37.5%) of patients with androgen- 
dependent cancers (p < 0.05). The results demonstrate 
that PSCA expression increases significantly with progres- 
sion to androgen-independence of human Pea. 

It is evident from the results above that within a majority 
of human prostate cancers the level of PSCA protein and 
mRNA expression conelates significantly with increasing 
grade, worsening stage and progression to androgen-inde- 
pendence. 

Correlation of PSCA immunostainlng and mRNA in situ 
hybridization 

In all 88 specimens surveyed herein, we compared the 
results of PSCA IHC staining with mRNA ISH analysis. 
Positive staining areas and its intensity and density scores 
evaluated by IHC were identical to those seen by ISH in 79 
of 88 (89.8%) specimens (18/20 BPH, 19/20 PIN and 42/ 
48 Pea respectively). Importantly, 27/27 samples with 
PSCA mRNA composite scores of 0-2, 32/36 samples 
with scores of 3-6 and 22/24 samples with a score of 9 
also had PSCA protein expression scores of 0-2, 3-6 and 
9 respectively. However, in 5 samples with PSCA mRNA 
overall scores of 3-6 and in 2 with scores of 9 there were 
less or negative PSCA protein expression (i.e. scores of 0- 
4), suggesting that this may reflect posttranscriptional 
modification of PSCA or that the epitopes recognized by 
PSCA mAb may be obscured in some cancers. The data 
demonstrate that the results of PSCA immunostaining 
were consistent with those of mRNA ISH analysis, show- 
ing a high degree of correlation between PSCA protein 
and mRNA expression. 
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Figure t 

Representatives of PSCA IHC and ISH staining in Pea (A. IHC staining, B. ISH staining, *200 magnification). A,, B,: negative con- 
trol of IHC and ISH. PBS replacing the primary antibody (A,) and hybridization with a sense PSCA probe (B,) showed no back- 
ground staining. A 2 , a moderately differentiated Pea (Gleason score = 3+3 = 6) with moderate staining (composite score = 
6) in all malignant cells; A 2 : IHC shows not only cell surface but also apparent cytoplasmic staining of PSCA protein. A 3 , B 3 : a 
poorly differentiated Pea (Gleason score = 4+4 = 8) with very strong staining (composite score = 9) in all malignant cells. 



Page 5 of 7 

(page number not for citation purposes) 



World Journal of Surgical Oncology 2004, 2 



http-7/www.wjso.com/content/2/1/1 3 



Discussion 

PSCA is homologous to a group of cell surface proteins 
that mark the earliest phase of hematopoietic develop- 
ment. PSCA mRNA expression is prostate-specific in nor- 
mal male tissues and is highly up-regulated in both 
androgen-dependent and-independent Pea xenografts 
(IAPC-4 tumors). We hypothesize that PSCA may play a 
role in Pea tumorigenesis and progression, and may serve 
as a target for Pea diagnosis and treatment. In this study, 
IHC and ISH showed that in general there were weak or 
absent PSCA protein and mRNA expression in BPH and 
low grade PIN tissues. However, PSCA protein and mRNA 
are widely expressed in HGPIN, the putative precursor of 
invasive Pea, suggesting that up-regulation of PSCA is an 
early event in prostate carcinogenesis. Recently, Reiter RE 
etal [1], using ISH analysis, reported that 97 of 1 18 (82%) 
HGPIN specimens stained strongly positive for PSCA 
mRNA. A very similar finding was seen on mouse PSCA 
(mPSCA) expression in mouse HGPIN tissues by Tran C. 
P et al [8]. These data suggest that PSCA may be a new 
marker associated with transformation of prostate cells 
and tumorigenesis. 

We observed that PSCA protein and mRNA are highly 
expressed in a large percentage of human prostate cancers, 
including advanced, poorly differentiated, androgen- 
independent and metastatic cases. Fluorescence-activated 
cell sorting and confocal/ immunofluorescent studies 
demonstrated cell surface expression of PSCA protein in 
Pea cells [9j. Our IHC expression analysis of PSCA shows 
not only cell surface but also apparent cytoplasmic stain- 
ing of PSCA protein in Pea specimens (Figure 1). One pos- 
sible explanation for mis is that anu-PSCA antibody can 
recognize PSCA peptide precursors that reside in the cyto- 
plasm. Also, it is possible that the positive staining that 
appears in the cytoplasm is actually from the overlying 
cell membrane [5]. These data seem to indicate that PSCA 
is a novel cell surface marker for human Pea. 

Our results show that elevated level of PSCA expression 
correlates with high grade (i.e. poor differentiation), 
increased tumor stage and progression to androgen-inde- 
pendence of Pea. These findings support the original IHC 
analyses by Gu Z et al [9], who reported that PSCA protein 
expressed in 94% of primary Pea and the intensity of 
PSCA protein expression increased with tumor grade, 
stage and progression to androgen-independence. Our 
results also collaborate the recent work of Han KR et al 
1 10], in which the significant association between high 
PSCA expression and adverse prognostic features such as 
high Gleason score, seminal vesicle invasion and capsular 
involvement in Pea was found. It is suggested mat PSCA 
overexpression may be an adverse predictor for recur- 
rence, clinical progression or survival of Pea. Hara H et al 
(11] used RT-PCR detection of PSA, PSMA and PSCA in 1 



ml of peripheral blood to evaluate Pea patients with poor 
prognosis. The results showed that among 58 PCa 
patients, each PCR indicated the prognostic value in the 
hierarchy of PSCA>PSA>PSMA RT-PCR, and extraprostatic 
cases with positive PSCA PCR indicated lower disease-pro- 
gression-free survival than those with negative PSCA PCR, 
demonstrating that PSCA can be used as a prognostic fac- 
tor. Dubey P et al (12) reported that elevated numbers of 
PSCA + cells correlate positively with the onset and devel- 
opment of prostate carcinoma over a long time span in 
the prostates of the TRAMP and PTEN +/- models com- 
pared with its normal prostates. Taken together with our 
present findings, in which PSCA is overexpressed from 
HCPIN to almost frank carcinoma, it is reasonable and 
possible to use increased PSCA expression level or 
increased numbers of PSCA-positive cells in the prostate 
samples as a prognostic marker to predict the potential 
onset of this cancer. These data raise the possibility that 
PSCA may have diagnostic utility or clinical prognostic 
value in human Pea. 

The cause of PSCA overexpression in Pea is not known. 
One possible mechanism is that it may result from PSCA 
gene amplification. In humans, PSCA is located on chro- 
mosome 8q24.2 (lj, which is often amplified in meta- 
static and recurrent Pea and considered to indicate a poor 
prognosis [13-15]. Interestingly, PSCA is in close proxim- 
ity to the c-myc oncogene, which is amplified in >20% of 
recurrent and metastatic prostate cancers [16,17]. Reiter 
RE et al 1 18) reported that PSCA and MYC gene copy num- 
bers were co-amplified in 25% of tumors (five out of 
twenty), demonstrating that PSCA overexpression is asso- 
ciated with PSCA and MYC coamplification in Pea. Gu Z 
et a! (9) recently reporteted that in 102 specimens availa- 
ble to compare the results of PSCA immunostaining with 
their previous mRNA ISH analysis, 92 (90.2%) had iden- 
tically positive areas of PSCA protein and mRNA expres- 
sion. Taken together with our findings, in which we 
detected moderate to strong expression of PSCA protein 
and mRNA in 34 of 40 (85%) Pea specimens examined 
simultaneously by IHC and ISH analyses, it is demon- 
strated mat PSCA protein and mRNA overexpressed in 
human Pea, and that the increased protein level of PSCA 
was resulted from the upregulated transcription of its 
mRNA. 

At present, the regulation mechanisms of human PSCA 
expression and its biological function are yet to be eluci- 
dated. PSCA expression may be regulated by multiple fac- 
tors [18], WatabeTet al [19] reported mat transcriptional 
control is a major component regulating PSCA expression 
levels. In addition, induction of PSCA expression may be 
regulated or mediated through cell-cell contact and pro- 
tein kinase C (PKC) [20]. Homologues of PSCA have 
diverse activities, and have themselves been involved in 
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carcinogenesis. Signalling through SCA-2 has been dem- 
onstrated to prevent apoptosis in immature thymocytes 
[21]. Thy-1 is involved in T cell activation and transducts 
signals through src-like tyrosine kinases [22]. Ly-6 genes 
have been implicated both in tumorigenesis and in cell- 
cell adhesion [23-25]. Cell-cell or cell-matrix interaction is 
critical for local tumor growth and spread to distal sites. 
From its restricted expression in basal cells of normal 
prostate and its homology to SCA-2, PSCA may play a role 
in stem/progenitor cell function, such as self-renewal (i.e. 
anti-apoptosis) and/or proliferation [1]. Taken together 
with the results in the present study, we speculate that 
PSCA may play a role in tumorigenesis and clinical pro- 
gression of Pea through afTecting cell transformation and 
proliferation. From our results, it is also suggested that 
PSCA as a new cell surface antigen may have a number of 
potential uses in the diagnosis, therapy and clinical prog- 
nosis of human Pea. PSCA overexpression in prostate 
biopsies could be used to identify patients at high risk to 
develop recurrent or metastatic disease, and to discrimi- 
nate cancers from normal glands in prostatectomy sam- 
ples. Similarly, the detection of PSCA-overexpressing cells 
in bone marrow or peripheral blood may identify and pre- 
dict metastatic progression better than current assays, 
which identify only PSA-positive or PSMA-positive pros- 
tate cells. 

In summary, we have shown in this study that PSCA pro- 
tein and mRNA are maintained in expression from 
HGPIN through all stages of Pea in a majority of cases, 
which may be associated with prostate carcinogenesis and 
correlate positively with high tumor grade (poor cell dif- 
ferentiation), advanced stage and androgen-independent 
progression. PSCA protein overexpression is due to the 
up regulation of its mRNA transcription. The results sug- 
gest that PSCA may be a promising molecular marker for 
the clinical prognosis of human Pea and a valuable target 
for diagnosis and therapy of this tumor. 
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Abstract 

Translation Initiation Is regulated in response to 
nutrient availability and mttogenlo stimulation and Is 
coupled with cell cycle progression and cefl growth. 
Several alterations In translatlonal control occur In 
cancer. Variant mRNA sequences can alter the 
translatlonal efficiency of IrufividuaJ mRNA molecules, 
which in turn play a role In cancer biology. Changes In 
the expression or availability of components of the 
translattonaJ machinery and In the activation off 
translation through signal transduction pathways can 
lead to more global changes, such as an increase in 
the overall rate of protein synthesis and translatlonal 
activation of the mRNA molecules Involved In cell 
growth and proliferation. We review the basic 
principles of translatlonal control, the alterations 
encountered In cancer, and selected therapies 
targeting translation Initiation to help elucidate new 
therapeutic avenues. 

Introduction 

The fundamental principle of molecular therapeutics In can- 
cer Is to exploit the differences in gene expression between 
cancer cells and normal cells. With the advent of cDNA array 
technology, most efforts have concentrated on Identifying 
differences In gene expression at the level of mRNA, which 
can be attributable either to DNA amplification or to differ- 
ences In transcription. Gene expression Is quite complicated, 
however, and Is also regulated at the level of mRNA stability, 
mRNA translation, and protein stability. 

The power of translatlonal regulation has been best recog- 
nized among developmental biologists, because transcription 
does not occur In eany ernbryogenesls In eukaryotes. For ex- 
ample, in Xenopus, the period of transcriptional quiescence 
continues until the embryo reaches mldblastula transition, the 
4000-cefi staga Therefore, all necessary mRNA molecules are 
transcribed during oogenesis and stockpiled In atranslationaJly 
Inactive, masked form. The mRNA are translationafly activated 
at appropriate times during oocyte maturation, fertilization, and 



early ernbryogenesls and thus, are under strict translatlonal 
control. 

Translation has an established role In cell growth. Basi- 
cally, an Increase in protein synthesis occurs as a conse- 
quence of mftogenesls. Until recently, however, little was 
known about the alterations in mRNA translation In cancer, 
aid much is yet to be discovered about their role In tho 
development and progression of cancer. Here we review the 
basic principles of translatlonal control, the alterations en- 
courrtered In cancer, and selected therapies targeting transla- 
tion Inflation to elucidate potential new therapeutic avenues. 

Basic Principles of Translatlonal Control 
Mechanism of Translation Initiation 
TVajislatton Initiation btte 
Translation Initiation is a ccrnplex process In whic^ 
tRNA and the 40S and 60S ribosomal subuntts are recruited to 
the 5' end of a mRNA molecule and assembled by eukaryotic 
translation Initiation factors Into an 80S ribosome at the start 
codon of the mRNA (Fjg. 1). The 5' end of eukaryotic mRNA Is 
capped, Lo., contains the cap structure m 7 GpppN (J-me&fyU 
guanosine~triprK>sprio^^ Most translation In 

eukaryotes occurs In a cap-dependent fashion, JLa, the cap fs 
specfficalty recognized by the eiF4E, 3 which binds the 5' cap. 
The e!F4F translation initiation complex Is then formed by the 
assembly of elF4E, the RNA heDcase'elF4A, and elF4G, a 
scaffolding protein that mediates the binding of the 40S ribo- 
somal subunK to the mRNA molecule through Interaction with 
the elF3 protein present on the 40S ribosoma elF4A and e!F4B 
participate in meffing the secondary structure of the 5' UTR of 
the mRNA. The 43S Initiation complex (40S/elF2/Met-tRNA/ 
GTP complex) scans the mRNA In a S'->3' direction until it 
encounters an AUG start codon. This start codon Is then base- 
paired to the anticodon of Initiator tRNA, forming the 48$ Initi- 
ation complex The Initiation factors are then displaced from the 
48S complex, and the 60S ribosome Joins to form the 80S 
ribosoma 

Unlike most eukaryotic translation, translation Initiation of 
certain mRNAs, such as the ptomavirus RNA, is cap Inde- 
pendent and occurs by Internal ribosome entry. This mecha- 
nism does not require elF4E Either the 43S complex can bind 
the Initiation codon directly through Interaction with the IRES in 
the 5' UTR such as In the enc^halomyocarditis virus, or ft can 
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Initially attach to the IRES and then reach the Initiation codon by 
scanning or transfer, as fe the case wfth the pollovirus (1). 

Regulation of Translation Initiation 
Translation Initiation can be regulated by alterations In the 
expression or phosphorylation status of the various factors 
involved. Key components In translattonaJ regulation that 
may provide potential therapeutic targets follow. 

e!F4E. eJF4E plays a central role in translation regulation. 
It Is the least abundant of the Initiation factors and Is con- 
sidered the rate-limiting component for Initiation of cap- 
dependent translation, elF4E may also be Involved In mRNA 
splicing, mRNA 3' processing, and mRNA nucfeocytoptas- 
mJc transport (2). elF4E expression can be Increased at the 
transcriptional level In response to serum or growth factors 
(3). e!F4E overexpresslon may cause preferential translation 
of mRNAs containing excessive secondary structure In their 
5' UTR that are normally discriminated against by the trans- 



lations! machinery and thus are Inefficiency translated (4-7), 
As examples of this, overexpresslon of elF4E promotes in- 
creased translation of vascular endothelial growth factor, 
fibroblast growth factor-2, and cyclin D1 (2, 8, 9). 

Another mechanism of control is the regulation of elF4E 
phosphorylation. eIF4E phosphorylation Is mediated by th© 
mftogen-activated protein Wnase-lrrteracting kinase 1, which 
Is activated by the mitogen-activated pathway activating 
extracellular signal-related kinases and the stress-actrvatect 
pathway acting through p36 mitogen-activated protein ki- 
nase (10-13). Several mitogens, such as serum, platelet- 
derived growth factor, epidermal growth factor, Insulin, 
angiotensin II, sic kinase overexpresslon, and ras over- 
expression, lead to eIF4E phosphorylation (14). The phos- 
phorylation status of elF4E Is usually correlated with the 
translations! rate and growth status of the cell; however; 
elF4E phosphorylation has also been observed In response 
to some cellular stresses when translational rates actually 
decrease (15). Thus, further study is needed to understand 
the effects of e!F4E phosphorylation on elF4E activity. 

Another mechanism of regulation Is the alteration of e!F4E 
availability by the binding of elF4E to the elR&btndlng pro- 
teins (4E-BP, also known as PHAS-I). 4E-BPs compete wfth 
elF4G for a binding site In e!F4E The binding of eiF4E to the 
best characterized elF4E-b!ndIng protein, 4E-BP1, is regu- 
lated by 4E-BP1 phosphorylation. Hypophosphorylated 4E- 
BP1 binds to eIF4E, whereas 4E-BP1 hyperphosprioiylatton 
decreases this binding. Insulin, angiotensin, epidermal 
growth factor, platelet-derived growth factor, hepatocyte 
growth factor, nerve growth factor, Insulin-like growth factors 
I and II, mterteukin 3, ojanulocyte-macrophage colorry-stirrv- 
ulating factor + steel factor, gastrin, and the adenovirus have 
all been reported to induce phosphorylation of 4E-BP1 and 
to decrease the ability of 4E-BP1 to bind eIF4E (15, 16). 
Conversely, deprivation of nutrients or growth factors results 
In 4E-BP1 derjhosphorylaflon, an Increase In elF4E binding, 
and a decrease In cap-dependent translation. 

p70S6 Kinase. Phosphorylation of ribosomal 40S protein 
S6byS6Kte thought to play an Important role In translational 
regulation. S6K mouse embryonic cells proliferate more 
slowly than do parental cells, demonstrating that S6K has a 
positive influence on cell proliferation (17). S6K regulates the 
translation of a group of mRNAs possessing a 5' terminal 
oligopyrimldine tract (5 f TOP) found at the 5' UTR of ribosomal 
protein mRNAs and other mRNAs coding for components of 
the translational machinery. Phosphorylation of S6K is regu- 
lated In part based on the avaBaMlty of nutrients (18, 19) and Is 
stimulated by several growth factors, such as platelet-derived 
growth factor and fnsuHn-flka growth factor I (20). 

elF2ft Phosphorylation, The binding of the Initiator tRNA 
to the small ribosomal unit is mediated by translation Initia- 
tion factor elF2. Pr*osphorytatton of the o>subunlt of elF2 
prevents formation of the elF2/QTP/Met-tRNA complex and 
Inhibits global protein synthesis (21, 22). elF2a Is phospho- 
rated under a variety of conditions, such as viral Infection, 
nutrient deprivation, heme deprivation, and apoptosls (22)! 
eIF2a Is phosphoryfated by heme-regulated Inhibitor, nutrient- 
regulated protein kinase, and the IFN-lnduced, double- 
stranded RNA-activated protein kinase (PKR; Ref. 23). 
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The mTOR Signaling Pathway. The macroBde antibiotic 
rapamydn (SlraHmus; Wyeth-Ayerat Research, CoHegevflle, 
PA) has been the subject of Intensive study because It In- 
hibits signal transduction pathways Involved In T-cell active- 
ton. Tte rapamycln-sensftlve component of these pathways 
Is mTOR (also called FRAP or RAFT1). mTOR Is the mam- 
malian homologue of the yeast TOR proteins that regulate Q i 
progression and translation In response to nutrient availabil- 
ity (24). mTOR Is a serine-threonine kinase that modulates 
translation Initiation by altering the phosphorylation status of 
4E-BP1 and S6K (Fig. 2; Ret 25). 

4BWi is phosphorated on mufflple 
phorylates the Thr-37 and ThM6 residues of 4E-BP1 In vitro 
(26); however, phosphorylation at these sites b not associated 
with a loss of e!F4E binding. Phosphorylation of T?r-37 and 
Thr-46 Is required for subsequent phosphorylation at several 
CXX)H:tefminaI, swim-sensitive sites; a combination of these 
phosphorylation events appears to be needed to Inhibit the 
binding of 4E-BP1 to eiF4E (25), The product of the /\7W gene, 
P38/MSK1 pathway, and protein kinase Or also ptayaroleln 
4&BP1 phosphotytation (27-29). 

S6K and 4E-BP1 are also regulated, In part, by P13K and its 
downstream protein kinase Akt PTB4 is a phosphatase that 
negatively regulates PK3K signaling, PTBI nuQ cells have 
constitutfveV active of Akt, with Increased S6K activity and 
S8 phosphorylation (3(0. S6K activity Is Inhibited both by 
PBK Inhibitors wortmannln and LY294002 and by mTOR 
Inhibitor rapamydn (24). Akt phosphorates Ser-2448 In 
mTOR In vitro, and this site Is phosphorated upon Akt 
activation In vfvo (31-33). Thus, mTOR Is regulated by the 
PI3K/Akt pathway; however, this does riot appear to be the 
only mode of regulation of mTOR activity. Whether the PI3K 
pathway also regulates S6K and 4E-BP1 phosphorylation 
Independent of mTOR Is controversial. 

Interestingly, mTOR autophosphorylation Is blocked by wort- 
manninbutrrctbyrapam^ 

suggests that mTOR-responslve regulation of 4E-BP1 and S6K 
activity occurs through a mechanism other than Intrinsic mTOR 
kinase activity. An alternate pathway fbr4£-BP1 and S6K phos- 
phorylation by mTOR activity Is by the Inhibition of a phospha- 
tase. Treatment with caiyculinA, an Inhibitor of phosphatases 1 
and 2A, reduces rapamydrvlnduoed dephosphorylation of 4E- 
BP1 and S6K by rapamydn (35). PP2A Interacts with full-length 
S6K but not with a S6K mutant that is resistant to dephospho- 
rylation resulting from rapamydn. mTOR phosphoryiates PP2A 
fn vhro; however, how this process alters PP2A activity Is not 
known. These results are consistent with the model that phos- 
phorylatlon of a phosphatase by mTOR prevents dephospho- 
rylation of 4E-BP1 and S6K, and conversely, that nutrient dep- 
rivation and rapamydn block Inhibition of the phosphatase by 
mTOR. 

Potyadenylatfon. The pofy(A) tail In eukaryotlc mRNA Is 
Important In enhancing translation Initiation and mRNA sta- 
bility. Polyadenylatfon plays a key rola In regulating gene 
expression during oogenesis and early embryogenesls. 
Some mRNA that are tranalationally inactive In the oocyte are 
polyadenylated concomitantly with translationaJ activation In 
oocyte maturation, whereas other mRNAs that are transla- 
tlonally active during oogenesis are deadenyfated and trans- 
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lationally silenced (38-38). Thus, control of poiy(A) tan syn- 
thesis Is an important regufatory step in gene expression. 
The5'capandpo!y(A)taflarathougMto 
tically to regulate mRNA translational efficiency (39, 40). 

RNA Packaging. Most RNA-blndlng pmteins are assem- 
bled on a transcript at the time of transcription, thus deter- 
mining the translational fate of the transcript (41). A highly 
conserved family of Y-box proteins Is found In cytoplasmic 
messenger ribonucleoproteln particles, where the proteins 
are thought to play a rote In restricting the recruitment of 
mRNA to the translational machinery (41-43). The major 
mRNA-assodated protein, YB-1, destabilizes the Interaction 
of el F4E and the 5' mRNA cap In vitro, and overcxpresslon of 
YB-1 results In translational repression In vivo (44). Thus, 
alterations in RNA packaging can also play an Important role 
In translationaJ regulation. 

Translation Alterations Encountered in Cancer 

Three main alterations at the translationaJ level occur in cancer 
variations In mRNA sequences that Increase or decrease trans- 
lational efficiency, changes In the expression or avallabQlty of 
components of the translational machinery, and activation of 
translation through aberrantly activated signal transduction 
pathways. The first alteration affects the translation of an Indi- 
vidual mRNA that may play a role In carcinogenesis. The sec- 
ond and third alterations can lead to more global changes, such 
as an Increase In the overaB rate of protein synthesis, and the 
translational activation of several mRNA species. 

Variations in mRNA Sequence 
Variations In mRNA sequence affect the translational effi* 
dency erf the transcript A brief description of these variations 
and examples of each mechanism follow. 

Mutations. Mutations In the mRNA sequence, especially 
In the 5' UTR, can alter Its translational efficiency, as seen In 
the following examples. 
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cwnya Sarto era*, proposed that translation of full-length 
<W> is repressed, whereas In several Burkitt lymphomas 
that have deletions of the mRNAS' UTR, translation qfc-myc 
is mere efficient (4ty More recently, It was reported that the 
5' UTH of omyc contains an I FES, and thus o-myc transla- 
tion can be Initiated by a cap-Independent as well as a 
cap-dependent mechanism (46, 47). In patients with multiple 
myeloma, a C-*T mutation In the c-myc IRES was Identified 
(46) aid found to cause an enhanced Initiation of translation 
via Internal ribosomal entry (49). 

8RQA1. A somatic point mutation (117 G-*C) in position 
-3 with respect to the start codon of the BRCA1 gene was 
Identified In a highly aggressive sporadic breast cancer (50). 
Chimeric constructs consisting of the wild-type or mutated 
BRCA1 5' UTR and a downstream tuclferase reporter dem- 
onstrated a decrease In the translational efficiency with the 5' 
UTR mutation. 

CycSn-dependfent Kinase Inhibitor 2A. Some Inherited 
melanoma kindreds have a Q->T transverslon at base -34 
of cyc&wfependent kinase inhibJtor-2A, which encodes a 
cycflivdependent kinase 4/cyclln-dependent kinase 6 kinase 
Inhibitor Important In G 1 checkpoint reguJatlon (51). This 
mutation,, ghres rise to a novel AUG translation Initiation 
codon, creating an upstream open reading frame that com- 
petes for scanning ribosomes and decreases translation 
from the wild-type AUG. 

Alternate SpBcJng and Alternate Transcription Start 
Sttes. Alterations In splicing and alternate transcription sites 
can lead to variation In 5' UTR seqi^^ 
ary structure, ultimately Impacting translational efficiency. 

ATM The ATM gene has four noncoding axons In its 5' 
UTR that undergo extensive alternative spBcfng (52). The 
contents of 12 different 5' UTRs that show considerable 
diversity In length and sequence have been identified. These 
divergent 5' leader sequences play an Important role In the 
translation al regulation of the ATM gene. 

mrfrn. In a subset of tumors, overexpresston of the onco- 
proteln mdm2 results In enhanced translation of the mdm2 
mRNA, Use of different promoters leads to two mdm2 tran- 
scripts that differ only In their 5' leaders (53). The longer 5' 
UTR contains two upstream open reading frames, and this 
mRNA Is loaded with ribosomes Inefficiently compared with 
the short 5' UTR. 

BRCAIm In a normal mammary gland, BRCA1 mRNA Is 
expressed with a shorter leader sequence (5'UTRa), whereas 
In sporadic breast cancer tissue, BRCA1 mRNA Is expressed 
with a longer leader sequence (5' UTRb); the translational 
efficiency of transcripts containing 6' UTRb Is 10 times lower 
than that of transcripts containing 5' UTRa (54). 

TCF-03. TGF-03 mRNA includes a 1.1 -kb 5' UTR, which 
exerts an Inhibitory effect on translatloa Many human breast 
cancer ceB fines contain a novel TGF-03 transcript with a 5' 
UTR that Is 870 nucleotides shorter and hasa7-fbfd greater 
translational efficiency than the normal 7GF-03 mRNA (55), 

Alternate Potyadenylation Sites. Multiple polyadenyl- 
atlon signals leading to the generation of several transcripts 
with differing 3' UTR have been described for several mRNA 
species, such as the RET proto-oncogene (56), ATM gene 
(52), tissue inhibitor pf metalloprotelnases-3 (57), RHQA 



proto-oncogene (58), and calmodulin-! (59). Although the 
effect of these alternate 3' UTRs on translation Is not yat 
known, they may be Important In RNA-proteln Interactions 
that affect translational recruitment The role of these alter- 
ations In cower development and progression Is unknown. 



Alterations In the Components of the 
Translation Machinery 

Alterations In the components of translation machinery cart 
take many forms. 

Overexpressslon of elF4EL Overexpresslon of elRE 
causes malignant transformation In rodent cells (60) and the 
deregulation of HeLa cell growth (61). Potunovsky etal. (62) 
found that elF4E overexpresslon substitutes for serum and 
Individual growth factors In preserving viability of fibroblasts, 
which suggests that elRE can mediate both proliferative and 
survival signaling. 

Elevated levels of elRE mRNA have been found In a broad 
spectrum of transformed ceB lines (63). elRE levels are 
elevated In all ductal carcinoma In situ specimens and Inva- 
sive ductal carcinomas, compared with benign breast spec- 
imens evaluated wfth Western Wot analysis (64, 65). Prelim- 
inary studies suggest that this overexpresslon Is attributable 
to gene amplification (66). 

There are accumulating data suggesting 
presslon can be valuable as a prognostic marker, elRE over- 
expression was found In a retrospective stucrytobeamarker of 
poor prognosis In stages I to in breast carcinoma (67). Verifica- 
tion of the prognostic value of eJRE In breast cancer Is now 
under way In a prospective trial (67). However, In a different 
study, etRE expression was correlated with the aggressive 
behavior of non-HodgWn f s lymphomas (68). In a prospective 
anatysis of patieht3 wfth head and neck cancer, elevated levels 
of elRE In hfetotogfcaDy tumor-free surgical margins predicted 
a significantly Increased risk of locakeglonal recurrence (9). 
These results aO suggest mat elRE overexpresslon can be 
usedtoeelectpatierfewtomlgmte 
systemtetterapy.F^ 

suggest that elRE overexpresslon Is a field defect and can be 
used to guide local therapy. 

Alterations in Other Initiation Factors. Alterations in a 
number of other initiation factors have been associated with 
cancer. Overproduction of elRG, similar to elRE, leads to 
malignant transformation fn Wfro (69). elF-2o Is found In 
Increased levels In bronchloloalveolar carcinomas of the lung 
(3). Initiation factor elF-4A1 Is overexpressed In melanoma 
(70) and hepatocellular carcinoma (71). The p40 subunit of 
translation initiation factor 3 is amplified and overexpressed 
In breast and prostate cancer (72), and the elF3-p1 1 0 subunit 
Is overexpressed in testicular seminoma (73). The role that 
overexpresslon of these initiation factors plays on the devel- 
opment and progression of cancer, if any, Is not known. 

Overexpresslon of S6K. S6K Is amplified and highly 
overexpressed In the MCF7 breast cancer cell line, com- 
pared with normal mammary epithelium (74). In a study by 
Bariund er aJ. (74), S6K was amplified in 59 of 668 primary 
breast tumors, and a statistically significant association was 
observed between amplification arid poor prognosis. 
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Overexpresslon of PAP. PAP catalyzes 3' poty(A) syn- 
desis. PAP ls.overexpre$sed In human cancer cells com- 
pared with normal and vfraliy transformed cells (75). PAP 
enzymatic activity in breast tumors has been correlated with 
PAP protein levels (76) and, In mammary tumor cytosols, was 
found to be an Independent factor for predicting survival (76). 
Little Is known, however, about how PAP expression or ac- 
tivity affects the translations profile. 

Alterations in RNA-bindlng Proteins. Even less is known 
about alterations in RNA packaging In cancer. Increased ex- 
pression aid nuclear localization of the RNA-bindlng protein 
YB-1 am IrKflcators of a poor prognosis for 
non-smafl ceB Jung carfcer (78)* and ovarian cancer (79). How- 
ever, this effect may be mediated at least fri part at the level of 
transcription, because YB-1 Increases chemorestetanoe by en- 
hancing the transcription of a multidrug resistance gene (80). 



Activation of Signal Transduction Pathways 
Adh/aBon of signal transduction pathways by loss of tumor 
suppressor genes or cweraxpressioncfce^ 
can contribute to the growth and aggressiveness of tumors. An 
topo rtan^ mutant in human cancers la the tumor suppressor 
gene P7BV, which leads to 

way. Activation of Pi3K and Akt Induces the oncogenic trans- 
formatlon of chicken embryo fibroWastallie transformed c^ 
show constitutive phosphory^ 
A mutant AM that retains kinase actMty but does not phos- 
phoryiate S6K or 4E-BP1 does not transform fibroblasts, w^ 
suggests a correlation between the oncogenfcSy of PI3K and 
Akt artd the phosphorylation of S6K and 4E-BP1 (81). 

Several tyrosine kinases such as platelet-derived growth 
factor, Insulin-like growth factor, HER2/neu, and epidermal 
growth factor receptor are overexpressed In cancer. Be- 
cause these kinases activate downstream signal transduc- 
tion pathways known to alter translation Initiation, activation 
of translation Is likely to contribute to the growth and aggres- 
siveness of these tumors. Furthermore, the mRNA for many 
of these kinases themselves are under translattonaJ control. 
For example, HER2/neu mRNA is translatlonaify controlled 
both by a short upstream open reading frame that represses 
HER2/neu translation In a cell type-Independent manner and 
by a distinct cell type-dependent mechanism that increases 
transiatlona! efficiency (82). HER2/heu translation Is different 
in transformed and normal cells. Thus, it is possible that 
alterations at the transiational level can In part account for 
the discrepancy between HER2/neu gene amplification de- 
tected by fluorescence /ns/to hybridization and protein levels 
detected by Immunohlstochemlcal assays. 



Translation Targets of Selected Cancer Therapy 
Components of the translation machinery and signal path- 
ways Involved In the activation of translation initiation repre- 
sent good targets for cancer therapy. 

Targeting the mTOR Signaling Pathway: Rapamycln 
and Tumstattn 

Rapamycln Inhibits the proliferation of lymphocytes. It was 
Initially developed as an Immunosuppressive drug for organ 



transplantation. Rapamycln with FKBP 12 (FKSOd-btndtng 
protein, M T 12,000) binds to mTOR to Inhibit its function. 

Rapamycln causes a small but significant reduction In the 
Initiation rate of protein synthesis (83). ft blocks cell growth In 
part by blocking S8 phosphorylation and selectively su£>- 
pressing the translation of 5' TOP mRNAs, such as ribosomai 
proteins, and elongation factors (83-85). Rapamycln also 
blocks 4E-BP1 phosphorylation and Inhibits cap-dependent 
but not cap-Independent translation (1 7, 86). 

The rapamydn-sensitfve signal transduction pathway, acti- 
vated during malignant trans^ 
b now bang stalled as a target!^ 
tate, breast, small ceB lung, gBoWastoma, melanoma, and T-cea 
leukemia are among the cancer fines most sensitive to the 
rapamycln analogue CO-779 (yVyettvAyeret Research; Ref. 
87). In rhabdomycc«arcomacell fines, rapamydn is either cyto- 
statte or cytocktel, depending on the p53 status of the ceB; p53 
wfkHypecefls treated wim rapamycln arrest in the G, phase 
and maintain their viabOty 

lateinQ, and undergo apoptosis(88, 89). bi a recently reported 
study using human prtnrdtive neuroectodermal tumor and 
medulloblastoma models, rapamycln exhibited more cytotox- 
icity in combination with cfepfatin and camptothecin than a3 a 
single agent In vivo, CCI-779 delayed growth of xenografts by 
160% aftert week of therapy and240% after2 weeks. A single 
high-dose administration caused a 37% decrease to tumor 
vduma Growth inWtion fn vivo was 1j3 times greater, with 
cfeplatin In combination with CCI-779 than with cisplatin atone 

(90) . Thus, precflnlcai studies suggest that rapamycln ana- 
logues are useful as single agents and in combination with 
chemotherapy. 

Rapamycln analogues CCI-779 and RAD001 (Novartie, 
Basel, Switzerland) are now In clinical trials. Because of the 
known effect of rapamycln on lymphocyte proliferation, a 
potential problem with rapamycln analogues is Immunosup- 
pression. However, although prolonged Immunosuppression 
can result from rapamycln and CCI-779 administered on 
continuous-dose schedules, the Immunosuppressive effects 
of rapamycln analogues resolve In -24 h after therapy 

(91) . The principal toxicities of CCI-779 have Included der- 
matological toxicity, myelosuppression, infection, mucositis, 
dlanhea, reversible elevations In liver function tests, hyper- 
glycemia, hypokalemia, hypocalcemia, and depression (87, 
92-94). Phase II trials of CCI-779 have been conducted In 
advanced renal ceil carcinoma and In stage Ill/TV breast 
carcinoma patients who failed with prior chemotherapy, in 
the results reported in abstract form, although there were no 
complete responses, partial responses were documented in 
both renal cell carcinoma and in breast carcinoma (94, 95). 
Thus, CCI-779 has documented preliminary clinical activity In 
a previously treated, unsetected patient population. 

Active Investigation is under way into patient selection for 
mTOR Inhibitors. Several studies have found an enhanced 
efficacy of CCI-779 In PTENmull tumors (30, 96). Another 
study found that six of eight breast cancer ceil lines were 
responsive to CCI-779, although only two of these lines 
lacked PTEN (97) There was, however, a positive correlation 
between Akt activation and CCI-779 sensitivity (97). This 
correlation suggests that activation of the Pi3K-Akt pathway, 
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regardless of whether It Is attributable to a PTEN mutation or 
to ovenexpresslon of receptor tyrosine kinases, makes can- 
cer cell amenable to mTOR-directed therapy. In contrast, 
lower levels of the target of mTOR, 4E-BP1, are associated 
with rapamycln resistance; thus, a lower 4E-BP1/elF4E ratio 
may predict rapamycln resistance (98). 

Another mode of activity for rapamycln and Hs analogues 
appears to be through Inhibition of angiogenesis. This activ- 
ity may be both through direct Inhibition of endothelial ceB 
proliferation as a result of mTOR Inhibition in these ceils or by 
Inhibition of translation of such proanglogenlc factors as 
vascular endothelial growth factor In tumor celte (99, 100). 

Hie angiogenesis Inhibitor tumstatin, another anticancer 
drug currently under study, was also found recently to Inhibit 
translation In endothelial ceils (101). Through a requisite In- 
teraction with Integrin, tumstatbi Inhibits activation of the 
PBWAkt pathway and mTOR in endothelial celte and pre- 
vents dissociation of eIF4E from 4E-BP1, thereby Inhibiting 
cap-dependent translation. These findings suggest that en- 
dothelial cells are especially sensitive to therapies targeting 
tiie mTOR-slgnaling pathway. 

Targeting dlF2cc EPA, Clotrimazole, mda-7, 
and Havonoids 

EPA Is an n-3 polyunsaturated fatty acid found in the fish- 
based diets of populations having a tow incidence of cancer 
(102). EPA Inhibits the proliferation of cancer ceils (103), as 
well as In animal models (104, 1 05). It blocks cell division by 
inhibiting translation initiation (105). EPA releases Ca 2+ from 
Intracellular stores while inhibiting their refining, thereby ac- 
tivating PKR. PKR, in turn phosphorylates and inhibits el!=2a, 
resulting In the toWbttlon of protein synthesis at the level of 
translation Initiation. Similarly, clotrimazole, a potent antipro- 
liferative agent In vftro and to vivo, Inhibits cell growth through 
depletion of Ca 2 * stores, activation of PKR, and phospho- 
rylation of e!F2a (108). Consequently, clotrimazole preferen- 
tially decreases the expression of cycllns A, E, and D1, 
resulting In blockage of the cell cycle In Q v 

mda-7 Is a novel tumor suppressor gene being developed 
as a gene therapy agent Adenoviral transfer of mda-7 (Ad- 
mda7) Induces apoptosfs In many cancer cells including 
breast, colorectal, and lung cancer (1 07-1 09). Ad-mda7 also 
induces and activates PKR, which leads to phosphorylation 
of elF2a and Induction of apoptosls (110). 

Flavonolds such as genlstein and quercetln suppress tu- 
mor cefl growth. All three mammalian elF2a kinases, PKR, 
heme-regulated Inhibitor, and PERK/PEK, are activated by 
flavonofds, with phosphorylation of elF2* and inhibition of 
protein synthesis (111). 



Targeting e!F4A and elF4B Antisense RNA 
and Peptides 

Antisense expression of elF4A decreases the proliferation rate 

cfmelarorra(^te(112).Seq^^ 

slon of 4E-BP1 Is proapoptotic and decreases tumorigenlcfty 

(113, 114). Reduction of elF4E with antisense RNA decreases 

soft agar growth, Increases tumor latency, and Increases the 

rates of tumor doubling times (7). Antisense eiF4E RNA treat- 



ment also reduces the expression of angiogenic factors (115) 
and has been proposed asapolenllaladluvart therapy Icfheacf 
and neck cancer* particularly when elevated eIRE Is found In 
surgk^nr^ins* Small rr»^^ 

4E-BP1-bindIng domain of elF4E are proapoptotic (116) and 
are also being actively pursued. 



Exploiting Selective Translation for Gene Therapy 
A different therapeutic approach that takes advantage of tho 
enhanced cap-dependent translation In cancer cdlsbtiie use 
of gene therapy vectors encoding suicide genes with highly 
structured S'UmThesemRNAwouWthusbeatacompetftiva 
disadvantage In normal cells and not translate well, whereas in 
cancer cells, they would tratetateiroreeffl^^ 
the introduction of the 5' UTR of fibroblast growth factor-2 5 ' to 
the coding sequence of heipes simplex vims type*1 1hymkBh& 
/cfriasegene,altow3ta 

vfrus typ&-1 thymidine kinase gene in breast cancer ceB fines 
compared with normal mammary cefl lines and results in se- 
lective sensitivity to gancfctovfr (117). 



Toward the Future 

Translation Is a crucial process In every cell However, several 
alterations to transtationaJ control occur In cancer. Cancer ceils 
appear to need an aberrantly activated translation state for 
survival, thus flowing the targeting of^aislatton Wtla^wlth 
surprisingly low toxicity. Components of the translational ma- 
chinery, such as elF4E, and signal transduction pathways bv 
votvod in translation Initiation, such mTOR, represent promising 
targets for cancer therapy. Inhibitors of the mTOR have already 
shown some preliminary activity in clinical trials. It Is possible 
that with the development of better predictive mattes and 
better patient selection, response nates to single-agent therapy 
can be improved Similar to other cytostatic agents, however. 
mTOR inhibitors are most fflcely to achieve cOnteal utifty to 
combination therapy. In the interim, our Increasing understand- 
ing of translation initiation and signal transduction pathways 
promise to ted to the Identification of new therapeutic targets 
In the near future. 
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Bases 

9,000 -| 
7,000 - 



. $55 




1 yg of RNA Ladder 
was heated at 60°C 
for 3 minutes in 1XRNA 
Ladder Sample Buffer 
and visualized by 
ethidium bromide 
staining (1.0% TBE 
agarose gel). 



Description: The RNA Ladder is a set of 7 RNA 
molecules produced by in vitro transcription of a 
mixture of 7 linear DNA templates. The ladder sizes are: 
9000, 7000, 5000, 3000, 2000, 1000 and 500 bases. 
The 3000 base fragment is at double intensity to serve 
as a reference band. This ladder is suitable for use as an 
RNA size standard on denaturing or native agarose gels. 

Reagents Supplied with Ladders: 

2X RNA Ladder Sample Buffer (for use with native 
agarose gels) 

2X RNA Ladder Sample Buffer: 

2X TBE (pH 8.3), 13% ficoll (w/v), 0.01% bromophenol 
blue and 7 M urea. 

Concentration: 500 pg/ml. 

Storage Conditions: 20 mM KOAc (pH 4.5). Store 
at -70°C. For short term storage (< 1 week), ladder can 
be stored at-20°C. 

Notes on Use: 

To avoid ribonuclease contamination: wear gloves, use 
RNase-free water for gels and buffers, wash equipment 
with detergent and rinse thoroughly with RNase-free 
water. 

It is best to use freshly poured gels that are as thin as 
possible (i.e. 2-10 mm). Excessively long run times or 



high voltage can cause degradation of the bands on 
the gel. We recommend 4-8 volts/cm and running the 
bromophenol blue approximately 5 cm into the gel for 
good resolution. 

Adding ethidium bromide to agarose gels and running 
buffeF at a final concentration of 0.5 uij/m! will 
effectively stain the bands during electrophoresis. 

Denaturing vs. Native Agarose Gels: 

It is common practice to electrophorese RNA on a fully 
denaturing agarose gel, such as one containing 
formaldehyde (1). However, in many cases it is possible 
to run RNA on a native agarose gel and obtain suitable 
results. In fact, it has been demonstrated that treatment 
of RNA samples in a denaturing buffer maintains the 
RNA molecules in a denatured state, during electro- 
phoresis, for at least 3 hours (2,3). The use of native 
agarose gels eliminates problems associated with toxic 
chemicals and the difficulties encountered when staining 
and blotting formaldehyde gels. 

References: 

(1) Sambrook, J., Fritsch, E. F. and Maniatis, T. (1989). 
Molecular Cloning: A Laboratory Manual, (2nd ed.), 
(pp. 7.43-7.45). Cold Spring Harbor: Cold Spring 
Harbor Laboratory Press. 

(2) Liu, Y-C, Chou, Y-C. (1990) Biotechniques 9, 558. 

(3) Sandra Cook and Christina Marchetti, unpublished 
observations. 



2-Log DNA Ladder (O.HO.O kb) 



#N3200S 
#N3200L 



100 ng 
500 ng . 



. $55 
$220 



DNA 




Mass 




(ng) 


Kilobases 


40 


10.0- 




40 


8.0 - 




48 


6.0- 


x — < 


40 


5.0 - 




32 


4.0- 




120 


3.0- 




40 


2.0- 




57 


1.5 - 




45 


1.2- 




122 


1.0 - 


mm 


34 


0.9 ~ 




31 


0.8- 




27 


0.7 - 




23 


0.6- 




124 


0.5 - 




49 


0.4 - 




37 


0.3 - 




32 


0.2 H 




61 


0.1 - 





1.0 yg of 2-Log 
DNA Ladder 
visualized by 
ethidium bromide 
staining on a 1.0% 
TBE agarose gel. 



Description: A number of proprietary plasmids are 
digested to completion with appropriate restriction 
enzymes to yield 19 bands suitable for use as molecular 
weight standards for agarose gel electrophoresis. This 
digested DNA includes fragments ranging from 100 bp 
to 10 kb. The 0.5, 1.0 and 3.0 kb bands have increased 
intensity to serve as reference points. 

Preparation: Double-stranded DNA is digested to 
completion with the appropriate restriction enzymes, 
phenol extracted and equilibrated to 10 mM Tris-HCI 
(pH 8.0) and 1 mM EDTA. 

Concentration: 1,000 ng/ml. 

Storage Conditions: 10 mM Tris-HCI (pH 8.0) 
and 1 mM EDTA. For long term storage, store at -20°C. 
2-Log DNA Ladder is stable for at least 3 months at 4°C. 

Note: All fragments have 4-base, 5' overhangs that can 
be end labeled using T4 Polynucleotide Kinase (NEB 
#M0201 ) or filled-in using DNA Polymerase I, Klenow 
Fragment (NEB #M0210) (1). Use a-[ 32 P] dATP or a-f 32 P] 
dTTP for the fill-in reaction. 

Usage Recommendation: We recommend loading 
1 pg of the 2-Log DNA Ladder diluted in sample buffer. 
This ladder was not designed for precise quantification of 
DNA mass but can be used for approximating the mass of 
DNA in comparably intense samples of similar size. 



The approximate mass of DNA in each of the bands in 
our 2-Log DNA Ladder is as follows (assuming a 1 ng 



loading): 






Fragment 


Base Pairs 


DNA Mass 


1 


10,002 


40 ng 


2 


v 8,001 


40 ng 


3 


6,001 


48 ng 


4 


5,001 


40 ng 


5 


4,001 


32 ng 


6 


3,001 


120 ng 


7 


2,017 


40 ng 


8 


1,517 


57 ng 


9 


1,200 


45 ng 


10 


1,000 


122 ng 


11 


900 


34 ng 


12 


800 


31 ng 


13 


700 


27 ng 


14 


600 


23 ng 


15a 
15b 


517 

500 ^ 


124 ng 


16 


400 


49 ng 


17 


300 


37 ng 


18 


200 


32 ng 


19 


100 


61 ng 
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